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SUBTELOMERIC DNA PROBES AND METHOD OF PRODUCING THE SAME 



RELATED APPLICATION 
This application claims the benefit of application serial number 60/415,345, filed on 
5 September 30, 2002, and application serial number 60/484,494, filed on July 2, 2003. 
Additionally, the content and teachings of each of these provisional applications is hereby 
incorporated by reference herein. 

SEQUENCE LISTING 

10 This application contains a sequence listing in both paper format and on two identical 

CD-ROM's filed herewith. The sequence listing on paper is identical to the sequence listing on 
the two CD-ROM's and all are expressly incorporated by reference herein. 

BACKGROUND OF THE INVENTION 

1 5 Field of the Invention 

The present invention is concerned with chromosomal ends and subtelomeres and 
the detection of chromosomal rearrangements occurring in the subtelomeric regions of 
chromosomes. More particularly, the present invention is concerned with probes that can be used 
to identify such chromosomal rearrangements in medical and cancer genetic diagnoses. Still 

20 more particularly, the present invention is concerned with single copy probes effective for 
hybridizing to a single location in the genome wherein hybridization analysis will indicate 
whether the chromosome has undergone any rearrangment at the telomere or subtelomere region. 
Still more particularly, the present invention is concerned with single copy probes that are useful 
for detecting a broader spectrum of abnormal chromosomal termini than currently detectable with 

25 existing cloned probes, providing insight into how the telomere and subtelomere regions of 
chromosomes are organized, correlating how the sequences of these chromosomal regions are 
related to each other and to other chromosomal regions, correlating rearrangements with specific 
clinical effects, and characterizing breakpoints in rare chromosomal rearrangements that are 
genetically balanced and unbalanced. Finally, the present invention is concerned with methods 

30 of making such probes. 
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Description of the Prior Art 

Chromosomes are the DNA-containing cellular structures of organisms and are visible 
as a morphological entity only during cell division. Chromosomes consist of two chromatids. 
Each pair of chromatids form a homolog, each having a short arm (the p arm), a long arm (the 
5 q arm), a centromere connecting the long arm to the short arm, and a telomere at each end. After 
pretreatment of the chromosomes with chemicals or heat, each of the arms exhibits alternating 
light and dark banding patterns that are a function of chromatin condensation. G-banding is in 
common use in clinical cytogenetics. R-banding or reverse band is occasionally used and is the 
reverse pattern of light and dark G-bands. G-banded chromosomes will be referred to in this 
10 application. 

The centromere is a specialized protein-DNA structure in human chromosomes that binds 
the chromatids together and is responsible for accurate segregation of chromosomes in somatic 
cells and germ cells. The centromere is often visible as a constricted region in the chromosome 
and its position is responsible for determining whether the chromosome is metacentric, 

15 submetacentric, or acrocentric. In metacentric chromosomes, the length of the p arm (or short 
ami) is roughly equal to the length of the q arm (or long arm). In submetacentric chromosomes, 
the length of the p arm is somewhat less than the length of the q arm. In acrocentric 
chromosomes, the length of the p arm is much shorter than the length of the q arm. It is known 
that acrocentric chromosomes have a specialized short arm comprised of highly repetitive DNA 

20 sequences and multiple copies of genes for ribosomal RNA. 

Telomeres are specialized protein-DNA structures that demarcate the ends of each 
chromatid in a chromosome. Typically, the telomeres are located in a light G-band which are 
gene rich and contain a lower density of repetitive sequences as compared to the dark G-band 
regions. Because of their location in the light G-bands, exchanges and rearrangements between 

25 the terminal ends (the telomeres) of chromosomes are difficult to detect visually. While 
telomeres are not chromosome-specific, the subtelomeric or telomere- associated repeat sequences 
immediately adjacent to them and also located in the light-staining G-bands can be chromosome- 
specific. The telomeres themselves are composed of a TG-rich repeat of 3-20kb in length, which 
in vertebrates is (TTAGGG) n . This array is required to maintain chromosome stability by 

30 preventing end-to-end chromosome fusions and exonucleolytic degradation. Additionally, 
telomeres are needed for replication of DNA and have an important role in maintaining cell 
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longevity. Immediately adjacent to the TTAGGG tandem repeats are families of complex 
repetitive DNA of up to several kilobases (kb) in length. These sequences tend to be present on 
multiple chromosomes, and are confined to the subtelomeric regions. Naturally occurring 
mutations in humans reveal that chromosomes lacking these repeats can be inherited normally, 
5 suggesting that these sequences have no important biological role. Sequence analysis of DNA 
adjacent to the 4p, 16, and 22q telomeres revealed interstitial degenerate (TTAGGG) n repeats 
dividing the subtelomeric regions into distal and proximal subdomains with different degrees of 
sequence similarity to other chromosome ends. The proximal subtelomeric sequence contains 
long sequences common to a small number of chromosomes and the distal subtelomeric 

10 sequences contain the previously described short complex repeats common to many 

chromosomes. Additionally, chromosome-specific low-copy repeats or duplicons (i.e. paralogs) 
can occur in multiple regions of the human genome including the subtelomeric regions. Trask 
et al identified members of the olfactory receptor gene family within a large segment of DNA that 
is duplicated and has high similarity near many human telomeres. Intra- and interchromosomal 

15 recombination between different duplicons in this gene family leads to chromosomal 

rearrangements. The similarity between non-allelic copies of highly related sequences (>95% 
homology) has made the subtelomeric domains extremely difficult to analyze at the molecular 
level. 

Subtle chromosomal rearrangements involving a gain or loss of the subtelomeric regions 
20 (neighboring sequences) have been observed in 0-10% of individuals with idiopathic mental 
retardation and other inherited clinical abnormalities. Other applications of subtelomeric probes 
include investigation of individuals with recurrent spontaneous miscarriages and infertility, 
characterization of constitutional and acquired chromosomal abnormalities, selected cases of 
preimplantation diagnosis, and diagnosis of abnormalities using interphase cells obtained either 
25 for chorionic villus sampling or early amniocentesis. 

Cytogenetically defined terminal deletions occur by three mechanisms: telomere 
regeneration or healing, retention of the original telomere producing interstitial deletions, and 
formation of derivative chromosomes by obtaining a different telomeric sequence, ie. telomere 
capture, through cytogenetic rearrangement. Because the majority of telomeric deletions are 
30 probably stabilized by telomere regeneration, this suggests that the maximum number of terminal 
deletions should be detected using probes that are as close to the telomere as possible. 
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Due to the small size of these rearrangements and the presence of pale staining bands at 
the ends of most chromosomes, the rearrangements are often not detectable by routine 
cytogenetic methods that include G-banding or R-banding. Instead, they are detected by DNA 
probe hybridization to chromosomes and fluorescence microscopy in a technique referred to as 
5 fluorescence in situ hybridization (or FISH) or by microsatellite analyses. Unlike microsatellite 
analyses which require that parental and/or other family members be studied in addition to the 
patient, FISH requires only the patient sample to detect the abnormality. Conventional FISH 
probes are generally between 60,000 and 170,000 base pairs in length with an average of about 
110,000 base pairs in length (rather than 5 million base pairs which is the average size of a 

10 chromosomal band) and usually come from a portion of one chromosomal band. Therefore, 
FISH can detect abnormalities not seen by routine cytogenetic methods. The probe hybridizes 
only to the homologous DNA sequences near the end of the chromosome arm. In normal 
individuals, there are 2 copies of the sequence (one from each parent) and thus, 2 sites of 
hybridization (one per chromosome of each homologous pair) in each cell. In patients with 

1 5 unbalanced terminal chromosome rearrangements, there is a deviation in either the copy number 
or location of the sequence, such that deletions are detected by the absence of hybridization from 
the end of the cognate chromosome and trisomies are detected by the presence of an additional 
hybridization signal on another chromosome. The chromosomal location of the hybridizations 
is immediately apparent from cytogenetic characterization of the chromosomes, enabling both 

20 balanced and unbalanced translocations to be detected. 

Given the highly repetitive telomere structure and the fact that all current approaches rely 
on the presence of unique sequence to investigate subtelomeric regions, there is a tradeoff using 
current assays between sensitivity and specificity. Sensitivity is defined as having a probe that 
detects the smallest deletions (ie. close to the chromosomal end), and specificity is defined as a 

25 probe that contains only sequences from a particular chromosome. Probes containing complex 
repeats in the distal telomeric and subtelomeric domain may lie closer to the end of the 
chromosome, but lack the specificity of single copy probes (such probes can be used to assess 
the integrity of multiple or all telomeres simultaneously). Current "chromosome-specific" probes 
capable of detecting specific subtelomeric regions are generally large, and usually do not lie in 

30 the distal subtelomeric interval. Due to their larger size, these conventional FISH probes have 
a greater likelihood of containing low frequency paralogous sequences found on other 
chromosomes (and hybridizations to such chromosomal targets cannot be suppressed by addition 
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of C G t 1 DNA). In order to select cloned probe sequences that do not have paralogous copies on 
other chromosomes, conventional FISH probes must be comprised of locus specific segments. 
Sequences meeting these criteria are often a considerable distance from the telomere. Deletions 
that occur between the sequence recognized by the probe and the telomere cannot be detected 
5 with such probes. Thus, assays that use large chromosome-specific telomeric probes compromise 
the sensitivity of the assay, as more distal terminal rearrangements will fail to be detected. 

The first generation of chromosome-specific FISH probes for each telomere (except the 
acrocentric p arms) were cosmids, fosmids, bacteriophage, PI, PAC clones derived from half 
YACS (Yeast Artificial Chromosomes), which possess large intact terminal fragments of human 

1 0 chromosomes. These clones are composed of clusters of single copy sequences interspersed with 
repetitive sequences on chromosomes. There is a paucity of chromosomal sequences with this 
genomic organization the ends of several chromosomes as a result of the high frequencies of 
paralogous sequences (often seen on multiple chromosomes) in the terminal bands of 
chromosomes and the relatively high densities of telomere associated repetitive sequences. Half 

15 YACS were not available for lp, 5p, 6p, 9p, 12p, 15q, and 20q telomeres and these ends were 
derived by screening genomic libraries with the most telomeric markers on the human radiation 
hybrid map. Consequently the physical distance between these clones and the cognate telomeres 
was unknown. It is now known that some of the subtelomeric commercially-available probes 
used in conventional FISH are not located near the telomeres but rather several hundred kilobases 

20 from the end. Interphase mapping has since shown that the commercially-available 9p clone is 
<1.2-1.5 Mb from the telomere and the commercially-available 12p clone is >800 kb from the 
telomere, whereas the commercially-available 1 5q clone maybe -1 00 kb from the telomere. The 
distances for some commercially-available lp, 5p, 6p, 1 lq, 19p, and Yp clones are still unknown. 
Large gap sizes between clones and the corresponding telomere, genomic polymorphism in 

25 hybridization patterns and cross-hybridization has prompted the development of a second 
generation set of telomere specific clones. While these clones are in the vicinity are of the 
telomere, substantial distances to the ends of the chromosomes remain. Some of the 
commercially available probes are so far from the telomere that they do not even reside in the 
terminal light-staining band region of the chromosome. For example, based on the coordinate 

30 of the sequence tag site (STS) in a commercial 14qtel probe, the probe is located in 14q32.32, 
a dark G-band, and is therefore closer to the centromere than any probe that would be contained 
in the terminal light band. These clones have large inserts, which assure that hybridization 
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intensities are adequate, however they may fail to detect deletions of sequences contained within 
the probes themselves or of sequences closer to the telomere itself. 

In conventional FISH, the DNA probes contain large genomic intervals (from -50 to 
several hundred kilobases) which consist of both unique and repetitive synthetic DNA. Because 
5 repetitive DNA has a widespread distribution, it can interfere with the detection of chromosome- 
specific abnormalities. As a result, methods have been developed to suppress the repetitive DNA 
and prevent binding of repetitive sequences to chromosomal DNA. One such method involves 
preannealing these repetitive sequences in the probe with an excess of unlabeled repetitive DNA, 
so that only the probe's unique sequences hybridize to the chromosome. 

10 Conventional probes suffer from many deficiencies including the fact that they are 

unsequenced and therefore, their locations have not been accurately determined in chromosomes. 
By comparison of the sequences of available sequence tagged sites (STS) contained within these 
probes, it has been demonstrated that several of these probes contain sequences that are 
considerable distances from the telomere (millions of base pairs). The lengths of the 

15 conventional probes themselves have only been approximately determined and the STS could 
occur anywhere within the probe. This means that the precise location of the probe can only be 
determined within a window spanning equal distances corresponding to the approximate length 
of the probe both proximal and distal of the STS. Furthermore, some of these conventional 
probes were derived by complementation of half-YACs (which lacking telomeres) functionally 

20 for the presence of sequences that serve as telomeres. In fact, several of these synthetic DNA 
clones do not contain the actual telomeres of a number of chromosome arms. Telomere-like 
sequences (which may have served as telomeres in lineages ancestral to humans) can be found 
at multiple internal locations in human chromosomes, and these sequences may have been 
selected for in the complementation studies that were developed to retrieve human telomeres and 

25 associated single copy sequences. 

Furthermore, the coordinates of several conventional probes cannot be determined 
because the sequence tagged sites (STS) reported by Vysis, Inc. and by Knight et al. correspond 
to their internal laboratory designations, rather than being assigned by the public Human Genome 
Organization nomenclature committee. Unless these laboratory-based STSs were deposited in 

30 the genome database, GenBank, or other public databases, the laboratory designations of these 
STSs cannot be related to publicly assigned STSs. Accordingly, due to these obstacles, the 
locations of several of these STSs have not been determined in public sources. Therefore, 
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synthetic clones presumed to contain subtelomeric sequences cannot be anchored on the 
reference genome sequence by these STSs and their location in the genome cannot be confirmed 
except by microscopic visualization of these probes. Such microscopic visualization lacks the 
very high resolution that can now be achieved by direct mapping onto the human genome 
5 reference sequence. The inability to map several of the available subtelomeric probes that are 
in common use in cytogenetic laboratories has potentially adverse consequences for patients with 
chromosomal abnormalities involving the terminal bands of chromosomes. If these probes 
consist of sequences that are localized considerable distances from the ends of the chromosomes 
(like the 1 4qter and 1 6pter commercial probes), then it will not be possible to determine whether 

10 the failure to detect an abnormality is due to the position of the probe on the chromosome, the 
size of the rearranged chromosomal region or both of these factors. This is the case for 
subtelomeric probes available for chromosomes lp, 5p, 6p,l lq, 19p, Yp, Yq . For such probes, 
it would not even be possible to determine if the failure to detect an abnormality is due to a false 
negative finding (ie. an error) using the probe. This situation is unacceptable practice for a 

15 reagent commonly used for clinical diagnosis of disease and an application for a medical 
diagnostic device based on them would be rejected by the US Food and Drug Administration 
based on current guidelines. Of course, the probes are labeled for research use only. Moreover, 
it is not even possible for one skilled in the art to investigate the locations of several of these 
probes because the clones from which they were derived are no longer available. This means that 

20 these conventional cloned reagents which are in common use cannot be subjected to quality 
control standards by independent researchers, despite the fact that these reagents are commonly 
used for detection of clinical abnormalities. Since the completion of the human genome 
reference sequence, several companies that produced genomic reagents for human genome 
mapping and characterization have discontinued support for these products or no longer maintain 

25 them, due to lack of demand. One of these companies that produced cloned synthetics for 
detection of subtelomeric rearrangements is no longer in business and the company that acquired 
them discontinued support for this product line 2 years ago. Accordingly, one thing that is 
needed in the art is a set of probes that are precisely localized and are derived from available 
genome sequences which are essentially perpetually available. 

30 Finally, it has been shown that prior art probes suffer from cross hybridization to other 

locations in the genome in addition to the location of interest. This occurs because many 
synthetic DNA probes for subtelomeric analysis are not sequenced and therefore, it is not 
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possible to verify by sequence analysis of the human genome that the DNA sequences contained 
in them do not have paralogous sequences at other distant locations on the same or other 
chromosomes. Consequently, several of these probes have been found to cross-hybridize to other 
chromosomes. The manufacturer (Vysis, Inc.) discloses that the following probes cross-hybridize 
5 to other chromosomes in their product literature: 



Probe 


Cross Hybridization Location 


3q 


2p 


4p 


17p 


8q 


lip 


lOp 


12p 


lip 


16p/17p/20p | 


16q 


4q/9q/10p/16p/18p 


17p 


lip 



15 

Additionally, the Xp and Yp share homology and a single probe that detects both is 
available. Similarly, a single probe to detect both Xq and Yq is available as they share 
homology. 

A hypothetical example can be used to describe the potential adverse consequences of 
20 such cross-hybridization. Suppose a parent contains a cryptic chromosome rearrangement that 
was a translocation between chromosomes lOp and 12p and this translocation is transmitted to 
her offspring in an unbalanced manner, such that one of the 1 Op sequences is missing and the 1 2p 
sequence is duplicated. Using the lOp probe, the normal copy chromosome lOp crosshybridizes 
to a single chromosome 1 2p, this would suggest that a translocation between these chromosomes 
25 had occurred. Because of the loss of lOp sequences from the other homologous chromosome, 
there would be only one hybridization evident each on chromosomes lOp and 12p. However, a 
chromosome 12 probe would hybridize to three copies of this chromosome (the normal and 
duplicated copies), which would be inconsistent with the results found with the lOp probe. 
Unequivocal interpretation of both findings would require unnecessarily complex (and 
30 ultimately, incorrect) explanations. Accordingly, what is needed in the art are probes that do not 



-9- 

cross-hybridize. Such probes would clearly and simply demonstrate the presence of the 
translocation and the unbalanced nature of the karyotype. 

Currently the two most common techniques for studying subtelomeric regions are 1) 
FISH of probes (BAC, PAC, PI, YAC and other large synthetic clones) mapped to terminal 
5 chromosomal bands, and 2) the use of polymorphic microsatellite markers mapped to the 
subtelomeric region. For the first technique, a number of disadvantages are observed. First, 
cross-hybridization of certain subtelomeric probes is evident, some polymorphisms resulting in 
deletions have been detected and not all of the probes are as close to the chromosomal termini 
as reported such that they would not be able to detect smaller subtelomeric rearrangements. 

10 Table 3 shows the distance of the common commercial probes used in clinical diagnosis from 
the end of the chromosome. 

For the second technique that involves use of polymorphic microsatellite analysis, one 
disadvantage is that the markers must discriminate between chromosomes (ie. be informative) 
and most of the informative markers are located a relatively long distance from the telomere. As 

15 a result, small deletions could be easily missed by this method. An additional disadvantage is 
that DNA samples from the patient's parents are required. 

Other molecular techniques have been developed and used for assessing subtelomeric 
regions. The multiplex amplifiable probe hybridization (MAPH) allows assessment of copy 
number at specific loci. This technique relies on correct genomic placement of currently mapped 

20 genetic loci/STSs and will miss small deletions if the loci/STSs have been placed in a wrong 
position within the chromosomal end. For example, D 1 6S3400 was originally placed within 300 
kb of the chromosomal end but we have placed it more than 3000 kb from the chromosomal end 
using the April 2003 version of the genome sequence (see table 3). 

Multiplex ligation dependent probe amplification (MLPA) is conceptually similar to 

25 MAPH, except that it is less tedious and simpler to perform on specimens from patients. Like 
MAPH, determination of sequence copy number in the specimen is dictated by an initial 
hybridization of probe to purified patient genomic DNA. Instead of measuring the amount of 
hybridized sequence with a secondary probe that is related to a target sequence, MLPA achieves 
specificity for the hybridization target by ligation of very short sequences homologous to the 

30 target in vitro. Read out occurs by PCR amplification of the annealed, hybridized probes using 
universal primers in vector sequences adjacent to the complement of the genomic target. Both 
approaches, however, depend on prior knowledge of the single copy nature of the genomic target 
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sequence in normal individuals, since the abnormalities is detected by determining the ratio of 
hybridization in normal and abnormal targets. This approach contrasts with the method of the 
instant invention, in which the single copy properties of a sequence are established during the 
development of the probe. This is not a trivial difference, since the presence of paralogous 
5 sequences in the genome related to the probe could result in false positive detection and distort 
the copy number ratio determined with the probe sequence. Given the very short lengths of the 
homologous genomic sequence contained in the MLPA probes, one skilled in the art would have 
to have prior knowledge of the single copy nature of the gene region from which the probe were 
derived, in order to be confident that paralogous targets were not present in the genome. Finally, 

1 0 while MLSP A is simpler to perform than MAPH, a substantial up front effort is required to clone 
a pair of genomic sequences in phage vectors by synthetic techniques prior to testing patient 
specimens. Such cloning steps are unnecessary in the art of the present invention. 

Array based comparative genomic hybridization (CGH) has been used used to survey 
subtelomeric rearrangements. This technique has the advantage of surveying multiple regions 

15 of the genome simultaneously, however it has a number of pitfalls that are not inherent in the 
present invention. For detection of unbalanced rearrangements, large cloned synthetic DNA 
probes in the telomeric region are required, (a) Several of these probes are not close to the 
telomere (b) the large size of these probes precludes the detection of small rearrangements, and 
(c) terminal chromosome rearrangements that overlap a portion of the sequence homologous to 

20 the probe will be scored as intact (ie. false negative results) (d) hybridization of repetitive 
sequences in these probes must be blocked, typically with an excess of Cotl DNA. Variability 
in the batches of Cotl DNA and in the efficiency of this blocking procedure has been shown to 
compromise the laboratory-to-laboratory reproducibility of this procedure, which makes it less 
suitable for clinical or reseach testing. 

25 Most of these techniques do not detect balanced translocations which is needed for 

identifying parental carriers of these rearrangements that could result in additional offspring with 
unbalanced chromosome complements and clinical abnormalities . Conventional FISH probes 
will detect these rearrangements if the chromosome breakpoint is contained within sequences 
homologous to the probe or if the probe is known to be distal to the breakpoint. The likelihood 

30 that a subtelomeric probe would detect such a rearrangement is quite low, since the probe is 
relatively small (100-300 kb) compared to the potentially large region in which the break might 
occur (several megabases) and generally has not been precisely localized within the chromosomal 
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interval. By contrast, the breakpoint for such rearrangements can be identified by systematic 
hybridization of an array of single copy probes derived from this chromosomal band (Knoll and 
Rogan Am J Med Genet 2003, the teachings and content of which are hereby incorporated by 
reference), whose positions in the genome are determined during the development of these 
5 probes. 

SUMMARY OF THE INVENTION 
The present invention overcomes the deficiencies of the prior art and provides a distinct 
advance in the state of the art. In particular, the present approach develops unique sequence, 

10 single copy hybridization probes that are considerably smaller and generally closer to the 
chromosome ends than available corresponding cloned probes for detection of subtelomeric 
abnormalities. Preferably, each probe is specific for a single chromosome arm. Additionally, 
the probe must be of sufficient length for detection, preferably by fluorescence microscopy, array 
comparative genomic hybridization or related techniques. The probes of the present invention 

1 5 preferably have lengths less than 25 kb, more preferably between about 25 base pairs and about 
15 kb, still more preferably between about 50 base pairs and about 12 kb, still more preferably 
between about 60 base pairs to about 10 kb, even more preferably between about 70 base pairs 
and about 9 kb, still more preferably between about 80 base pairs and about 8 kb, still more 
preferably between about 90 base pairs and about 7 kb, still more preferably between about 100 

20 base pairs and about 6 kb, still more preferably between about 250 base pairs and about 5 kb, still 
more preferably between about 500 base pairs and about 4.5 kb, more preferably between about 
1 kb and about 4 kb, and most preferably between about 1 .5 kb and about 3 .5kb. Such preferred 
probes are up to 1 00X smaller than the currently available probes. Advantageously, these small 
probes can be designed to exclude hybridization to low copy paralogous sequences on other 

25 chromosomes. Due to their size and the relative abundance of paralogous sequences in these 
regions, larger cloned probes, such as those that are currently commercially-available, are more 
likely to contain sequences with paralogs on other chromosomes. Such larger probes have 
greater potential to compromise specificity, and therefore might not be ideal for distinguishing 
the subtelomeric region of a particular chromosome from other genomic sequences. The 

30 requirement for hybridizing larger probes provides one explanation as to why these clones are 
comprised of genomic sequences that lie further away from the telomere and why some contain 
paralogous, cross-hybridizing sequences. Moreover, the isolated short genomic intervals 
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recognized by single copy probes permit the identification of specific hybridization intervals that 
are closer to the ends of chromosomes than available synthetic DNA probes that are presently 
used for detection of subtelomeric rearrangements. Hybridization of probes of the present 
invention is detectable regardless of whether the entire probe or only a portion of the probe is 
5 bound to the chromosome. Therefore, the extent of a chromosomal region gain or loss that 
involves only a portion of the probe sequence may not be recognized by the prior art probes but 
will be recognized by the probes of the present invention. The shorter probes of the present 
invention will thereby produce fewer misdiagnoses (false negative results for chromosome 
deletions, for example) when analyzing the genomes of patients whose breakpoints occur within 

10 the chromosomal sequences spanned by the hybridized probe. 

Probe design for single copy hybridization should permit generation of considerably 
smaller probes that are closer to the chromosomal ends than are currently available. Generally, 
the method comprises searching a moving window beginning at the terminal nucleotide on a 
chromosome end on the human genome sequence database (i.e., Public Consortium Celera 

15 Genomics Data Bases) to identify single copy intervals in the terminal chromosomal band. 
Preferably the single copy interval is the single copy interval in the subtelomeric region that is 
closest to the telomere. Preferably, the single copy interval is within about 8000 kb of the 
terminal nucleotide of the telomere of the chromosome, more preferably it is within about 7000 
kb of such a terminal nucleotide, still more preferably it is within about 6000 kb of such a 

20 terminal nucleotide, even more preferably it is within about 5000 kb of such a terminal 
nucleotide, more preferably it is within about 3500 kb of such a terminal nucleotide, still more 
preferably it is within about 2500 kb of such a terminal nucleotide, even more preferably it is 
within about 1500 kb of such a terminal nucleotide, more preferably it is within about 1000 kb 
of such a terminal nucleotide, even more preferably it is within about 800 kb of such a terminal 

25 nucleotide, more preferably it is within about 600 kb of such a terminal nucleotide, more 
preferably it is within about 500 kb of such a terminal nucleotide, still more preferably it is within 
about 400 kb of such a terminal nucleotide, even more preferably it is within about 300 kb of 
such a terminal nucleotide, still more preferably it is within about 200 kb of such a terminal 
nucleotide, and most preferably it is within about 100 kb of such a terminal nucleotide. The 

30 method may then comprise the step of verifying that the identified interval is in fact a single copy 
sequence and is found only in that interval. Such verification can take place either 
computationally or experimentally and a preferred method includes both forms of verification. 



-13- 

Experimental confirmation or verification can be accomplished through conventional techniques 
including experimentally hybridizing the single copy sequence to chromosomes. Computational 
verification can occur by conventional computer-based techniques for searching genomes 
including analyses with BLAT or BLAST software. However, other equally suitable techniques 
5 for genome- wide computational sequence comparisons would also verify the single copy nature 
of potential probes. Single copy sequences are then sorted by length and primers are designed 
for some of the intervals (preferably those greater than 1.5 kb in length because they can be 
reliably visualized by FISH and those closest to the telomere but in the sub telomere region). 
Primers developed during such an approach would indicate to those of skill in the art that the 

10 desired sequences could be developed using conventional techniques and publicly available 
knowledge including the publicly available genome databases. This is because the coordinates 
of the primers can be found in the genome databases and then these primers can be used to 
generate the sequence of interest. Furthermore, the developed sequence can be verified by 
comparison to the genome drafts. Primers developed by the present invention and their locations 

15 are provided herein. 

Single copy probe technology, such as that disclosed in U.S. Serial Nos. 09/573,080 (filed 
May 16, 2000) and 09/854,867 (filed May 14, 2001) (the teachings and content of both 
applications is hereby incorporated by reference) is appropriate for developing subtelomeric 
sequences, since the majority of probes hybridize only to the correct chromosomal location in 

20 the majority of chromosomes, es single copy probes canbe designed, amplified, purified and 
labeled in parallel. For probes that do not hybridize to a single location, when related sequences 
are missing from the draft genome sequence, alternative primers were developed for these loci 
or neighboring loci. Probes that show hybridization to multiple loci can also be bisected into two 
or more parts to determine which component hybridizes to paralogous loci or repetitive 

25 sequences. Such bisection involves development of internal primers, possibly new end primers 
and hybridization of the new products to chromosomes. Unlike other chromosomal regions, the 
subtelomeric intervals of many chromosomes present some unusual challenges in the design of 
single copy probes. While these regions are quite gene-rich, there has been considerable 
exchange and duplication of genetic material between the terminal sequences of different 

30 chromosomes. 

In more detail, subtelomeric single copy probes are developed using computer software- 
based design of DNA probe sequences corresponding to subtelomeric intervals. This involves 
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identification of most subtelomeric single copy intervals, then comparison of these intervals with 
the genome draft to verify the sequence interval is not present at other locations in the human 
genome sequence. Because the human genome sequence is considered to be more accurate as 
additional data are incorporated in more recent versions of the sequence, currently designed 
5 probes are compared to these versions of genome sequence to determine if coordinates of 
designed probes remain within 300 kb of the end of the chromosome. If large amounts of 
additional sequence (>300 kb) have been added to the telomeric end of the draft sequence of a 
chromosome since the production of a probe, new probes that are closer to the chromosomal ends 
are designed from the newly established subtelomeric interval. 

10 Next, fragments are synthesized using PCR- amplification with multiple pairs of primer 

sets for each subtelomeric region. Other approaches or direct synthesis of single copy probes 
would also be feasible (see U.S. P/N 6,521,427, the teachings and content of which are hereby 
incorporated by reference), however, these methods are more suited for high volume probe 
production than the instant methods. The majority of designed probes can be amplified and 

1 5 amplification can be optimized to produce a single homogeneous PCR product. Infrequently, no 
amplification is observed for a set of primers. This necessitates that the PCR amplification 
conditions be carefully optimized, and primer and amplification product sequences are re- 
examined to determine if they exhibit homology to sequences on other chromosomes. If PCR 
amplification is still not achieved, alternative primer sets unique to this locus are prepared and 

20 the amplification procedure is repeated. 

Once amplification reactions are optimized, then multiple (or a single large volume) 
reactions are performed in parallel to obtain adequate product for hybridization. The product is 
either isolated by gel electrophoresis and purified by column centrifiigation or by non-denaturing 
high performance liquid chromotography (DHPLC) purification of reaction mixtures. The 

25 product is then labeled by nick translation, purified and hybridized to normal metaphase 
chromosomes from two individuals (at least one male) and analyzed by fluorescence microscopy. 
If hybridization efficiency is low (due to low specific activity of incorporation of the modified 
nucleotide), the probe is relabeled and the chromosomal hybridization is repeated. Multiple 
single copy probes from adjacent intervals may be combined to increase hybridization signal 

30 intensities. 

For probes that hybridize to multiple sites, several alternative methods are available. One 
such method involves bisecting the primary product into two or more derived products, which 
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are synthesized, labeled and hybridized. If information in the genome sequence database reveals 
which probe sequences contain potential paralogous copies, the probe is bisected to exclude such 
sequences. The genome sequence from the region is examined for its location and sequence 
content in multiple versions of the genome draft as the genome draft is continually being updated 
5 with new information. If both bisected components continue to cross-hybridize, a single copy 
probe is designed from the adjacent proximally-located genomic interval. Alternatively or 
additionally, the primary product is also preannealed with C c t 1 DNA to determine if 
hybridization to multiple chromosomal loci can be reduced or eliminated. If this procedure 
results in a chromosome-specific subtelomeric hybridization pattern, it indicates that the probe 
10 contains a highly reiterated sequence that was not detected during probe design. In this 
circumstance, a single copy probe is designed from the adjacent proximally-located single copy 
genomic interval. 

The present invention therefore finds great utility in detecting chromosomal 
rearrangements. It has recently been estimated that chromosomal rearrangements resulting in an 

15 imbalance in DNA sequences near the ends of chromosomes may account for up to 10% of 
individuals with idiopathic mental retardation and other clinical findings. Specialized 
chromosome testing such as conventional fluorescence in situ hybridization (FISH) involving 
DNA probes from these chromosomal regions is required to detect these abnormalities. Now that 
the human genome sequence has become available, we have recognized that a substantial number 

20 of the commercial DNA probes that are commonly used to detect these rearrangements are not 
found at the ends of the chromosomes. Many of the probes of the present invention are closer to 
the ends of chromosomes than the currently available probes, thereby allowing identification of 
some patients with terminal rearrangements of human chromosomes that may not be identifiable 
with currently available commercial probes. Probes produced in this way are useful for: (a) 

25 detecting a broader spectrum of abnormal chromosomal termini than currently detectable with 
existing cloned probes (b) providing insight into how these chromosomal regions are organized 
and (c) how the sequences of these chromosomal regions are related to each other and to other 
chromosomal regions. We have previously used human genome sequences to directly develop 
single copy probes targeted to a wide variety of chromosomal regions for fluorescence in situ 

30 hybridization (scFISH) (US 09/854,867, filed May 14, 2001) (the teachings and content of which 
is hereby incorporated by reference). Such probes may also be useful in detecting previously 
unrecognized terminal rearrangements in some patients. 
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The present invention also provides a streamlined process for producing arrays of single 
copy probes. Arrays of multiple single copy probes can be designed to cover the same target 
sizes as conventional recombinant probes, however, other unique applications of these arrays 
increase the resolution of delineating abnormalities. scProbe arrays can either be used to 
5 simultaneously detect targets from multiple chromosomal regions or from a single continuous 
genomic interval and the automated production of single copy probe arrays is a high throughput 
process. Such a process was used to simultaneously develop single copy probes from all 
euchromatic chromosomal termini. Such arrays can also be used for precise delineation of 
translocation, the deletion, and other rearrangement boundary breakpoints in subtelomeres. For 

1 0 example, multiple probes have been developed from chromosome 9q34 and different subsets of 
these probes have been hybridized in combination in order to examine the ABL1 chromosomal 
breakpoints in chronic myelogeneous leukemia (CML) and to detect upstream ABL1 deletions 
that are associated with early blast crisis (Knoll and Rogan, Sequence-Based In Situ Detection 
of Chromosomal Abnormalities at High Resolution, Am. J. Med. Gen. 121 A:245-257 (2003)). 

1 5 One aspect of the present invention is that the single copy probes of the present invention 

(with the exception of chromosomes 3p and 19q) are located in the generally light-staining 
terminal G-bands of the chromosome. This is significant because in routine clinical cytogenetic 
analysis, metaphase chromosomes are banded and examined microscopically to look for 
alterations in chromosome number or chromosome structure. Chromosome pairs are aligned 

20 according to size and banding pattern. This alignment is called the karyotype and it is the 
standard and basic method for examining the integrity of all chromosomes in a cell. In a normal 
human cell, there are 46 chromosomes, 22 pairs of autosomes (numbered 1 through 22) and one 
pair of sex chromosomes (XX in females and XY in males). Chromosomes are paired and 
arranged in the karyotype from largest to smallest in size and according to placement of their 

25 centromere and the subsequent designation of the chromosome as metacentric, submetacentric, 
or acrocentric. Each chromosome contains DNA (unique single copy, repetitive dispersed and 
highly reiterated DNA) and protein. The centromeres of each chromosome and the majority of 
the chromosome Y long arm contain heterochromatin which is comprised of repetitive DNA that 
is transcriptionally inactive. The short arms of acrocentric chromosomes also have highly 

30 repetitive DNA in addition to multiple copies of genes for ribosomal RNA. The telomeres of 
chromosomes contain short telomere- specific DNA repeat sequences (TTAGGG) n that function 
to cap and protect the ends of the chromosome. Adjacent to the telomeric regions, are 
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subtelomeric regions which are comprised in part of chromosome specific DNA sequences and 
telomere associated repeats (Figure 16). Exceptions to chromosome specificity of the 
subtelomeric regions include the short arms of acrocentric chromosomes, the long arm of the Y 
chromosome which contains heterochromatin and shares homology with the end of the X 
5 chromosome long arm. 

When chromosomes are pretreated with methods that could involve heat or chemicals 
each of the 22 autosomes and the sex chromosomes have a characteristic banded pattern that 
uniquely identifies that chromosome. The bands are dark and light staining structures on 
metaphase chromosomes and serve as chromosome specific landmarks. It is onto these structures 
1 0 that cloned DNA sequences have been mapped. They provide reference points for localizing and 
ordering nucleic acid probes, sequence tagged sites, ESTs, DNA contigs, genes, etc that 
otherwise could not be referenced as no single chromosome has been sequenced in its entirety 
due to the repetitive nature of centromeric regions, heterochromatic regions and acrocentric short 
arms. 

1 5 The commonly used banding pattern in clinical cytogenetics is referred to as G-banding 

and this banding is often achieved by pretreating chromosomes with trypsin followed by staining 
them with Geimsa but other methods of treatment such as staining with fluorescent dyes (such 
as but not limited to 4,6-diamidino-2-phenylindole) also yield chromosome specific banding 
patterns. R-banding are reverse banding is the reversed pattern of light and dark G-bands. 

20 Chromosomes captured at different times of the cell cycle, i.e., metaphase versus prometaphase, 
results in chromosomes with more or fewer visible bands. 

Chromosome anomalies identified by karyotyping of banded chromosomes are described 
using the International System for Cytogenetic Nomenclature (ISCN), first introduced in 1971 
and published in 1972, with the 1995 version in current usage around the world (ISCN , 1995). 

25 This nomenclature is the universal language for cytogeneticists and clinicians to describe 
chromosomal abnormalities so that findings can be communicated to one another and other 
clinical professionals without the need to provide a karyotype each time. The ISCN also provides 
a reference for chromosome band resolution. The ISCN defines 3 different levels of band 
resolution by the number of visible bands; 400, 550, and 850 bands per haploid karyotype. A 

30 typical high-resolution cytogenetic study will have a band-resolution of at least 550 bands. At 
this level of resolution, the terminal G-bands are light staining for all chromosomes except 
chromosomes 3p, 19q and Yp. Chromosomal bands for many regions separate into light and/or 
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dark staining sub-bands as the resolution increases. At the 850 band level, chromosome Yp also 
has a light staining terminal band, the terminal chromosome 3p band (ie. 3p26) separates into 
three small sub-bands - two dark (3p26.1, 3p26.3) and one light (3p26.2), and the terminal 
chromosome 19 band (19ql3.4) separates into three small sub-bands - two dark (19q 13.41, 
5 19ql3.43) and one light band (19ql3.42). As a result of the chromosomal ends being light 
staining and thus appearing the same for most chromosomes, any exchanges (i.e., translocations) 
between only these terminal chromosomal bands or within those chromosomal regions would not 
be recognized by routine cytogenetic analysis. Such a physical characteristic requires the 
utilization of other molecular methods, such as fluorescence in situ hybridization (FISH) with 
10 chromosome specific nucleic acid probes, in order to identify terminal chromosomal band 
rearrangements . 

The structural definitions provided by this nomenclature allows probes (including genes) 
to be mapped to chromosomal bands (which are an average size of 5 million base pairs) by those 
of skill in the art. Advantageously, ISCN banding notation, although imprecise, is stable. 

15 Moreover, the human genome sequence is only interpretable by reference to this banded 
chromosome scaffold. In fact, the sequence is not complete because limitations of technology 
has not permitted sequencing of (a) centromere and heterochromatin and (b) acrocentric 
chromosomes (13,14,15,18,21,22) p arm sequences. As a result, the existing array of human 
genome contigs can unequivocally be placed on this scaffold by reference to the banding 

20 information. Otherwise, one without knowledge of the genome sequence, might think, for 
example, that position 1 of chromosome 21 in either the public or private human genome 
sequence databases actually begins at the beginning of the p arm, which is not correct.. 
Accordingly, in order to accurately and consistently describe where sequences are located, one 
must use the coordinate and the sequence together as using either the sequence or the coordinate 

25 alone as the structural feature that links the probes together, would lead to erroneous results. 

Another aspect of the present invention provides methods for the application of single 
copy products for solid phase hybridization of subtelomeric chromosomal sequences. One 
skilled in the art can appreciate that single copy nucleic acid products synthesized by the instant 
method can be stably attached to solid surface by covalent chemical or electrostatic charge 

30 neutralization, and subsequently hybridized to a solution composed of a mixture of labeled 
nucleic acids. Typically, the substrate will be a microscope slide, however other surfaces, for 
example columns, capillaries or chips may also be used. The nucleic acid mixtures may be 
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comprised of purified DNA complete genomes, a set of synthetic clones, DNA fragments, PCR 
products or a library of cDNA or cRNA. An array of single copy probes of the art may be used 
as targets for comparative genomic hybridization (CGH) methods. This array would be 
advantageous for detection of subtelomeric rearrangements compared to current arrays based on 
5 synthetic genomic clones. The hybridization reaction of labeled genomic DNA to arrays of 
synthetic genomic clones requires the addition of a reagent repetitive DNA sequences for 
blocking repeat sequence hybridization, also known as Cot 1 DNA. The array CGH technique 
offers an alternative approach for simultaneous identification of monosomy and trisomy of the 
subtelomeric regions of chromosomes. This is based on comparing the relative intensities of 

10 hybridization of a normal and a patient genomic sequences, each labeled with a different 
fluorescent moiety. In a recent multicenter study of array CGH based on cloned probes (Carter 
et al. Cytometry 49:43-48, 2002), the teachings and content of which are incorporated by 
reference herein), variability in suppression of repetitive sequence hybridization in these clones 
was shown to be the most common explanation for lack of reproducibility between laboratories 

15 working with the same batch of labeled genomic probes and clones. The failure to completely 
suppress repeat sequence hybridization introduced errors in measurements of the 
normal/abnormal fluorescence intensity ratios. This source of error would not be present using 
arrays comprised of single copy products, since it would not be necessary to add blocking reagent 
to the hybridization reaction. In addition, delineation of the boundaries of the imbalanced 

20 chromosomal region would be more precise using CGH arrays comprised of single copy products 
since the locations of these probes on the chromosome have been precisely defined at the 
nucleotide sequence level, in contrast with many synthetic genomic probes that have been 
traditionally used for array CGH and FISH analysis of subtelomeric rearrangements. 

In another aspect of the present invention, a method of using the probes and correlating 

25 them with clinical phenotypes is provided. Subtelomeric regions have been studied by 
conventional FISH with synthetic DNA probes in individuals with cytogenetically normal 
chromosomes (at ^550 band resolution) identify a molecular defect. These regions have also 
been studied in some individuals with visible cytogenetic abnormalities to further characterize 
the abnormality. The normal chromosome study population includes 1) those with infertility or 

30 multiple pregnancy loss; and 2) individuals with mental retardation in which the common causes 
of mental retardation have been excluded and the cause remains unknown (ie. idiopathic mental 
retardation). For the cytogenetically normal patient populations, the subtelomeric results of these 
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studies did not demonstrate any increase in abnormalities in individuals with multiple pregnancy 
losses or infertility. However, for those individuals with a diagnosis of idiopathic mental 
retardation, subtelomeric abnormalities were found in -0.5% with mild mental retardation, and 
in -5% (range of 0-10%) of those with moderate to severe mental retardation and other clinical 
5 abnormalities. For the moderately to severely retarded individuals, different studies report a wide 
range in the frequency of subtelomeric abnormalities. This is probably related to ascertainment 
bias as a result of the relatively nonspecific clinical criteria that were used to define the 
subtelomeric study population. The best clinical indicators for performing subtelomeric analysis 
in moderately to severely retarded individuals included a positive family history of mental 

1 0 retardation, growth retardation (prenatal and postnatal), dysmorphic facies and one or more other 
nonfacial dysmorphic features and/or congenital abnormalities. 

Mental retardation is the common feature in most if not all patients with subtelomeric 
abnormalities resulting in genetic imbalances. There are few subtelomeric deletions that result 
in a specific set of clinical features that can direct the clinician towards a diagnosis. The majority 

15 of patients with subtelomere abnormalities currently lack a characteristic set of clinical findings. 
For these patients, the subtelomere defect is generally loss of the region (ie. deletion or 
monosomy) or loss of one region and gain of another chromosomal end due to an unbalanced 
reciprocal translocation (ie.partial monosomy for one chromosome and partial trisomy for 
another chromosome). Given the number of chromosomes and the number of subtelomeric 

20 regions, there are a very large number of different combinations of partial monosomy and partial 
trisomy for different subtelomeric regions. It seems likely that the rather substantial number of 
potential chromosome rearrangements would result in an equally diverse set of clinical 
phenotypes. There are several other factors that could also give rise to the clinical variability. 
They include: 1) the amount (and genetic content) of the terminal band or bands that are lost in 

25 deletions given the length of the terminal chromosomal bands (several million base pairs), 2) 
plus the size of the chromatin loss and gain in unbalanced translocations and 3) variable 
unmasking of recessive alleles on homologs. For most subtelomeric abnormalities, the number 
of patients with similar abnormalities reported is limited and for some subtelomeric regions, no 
cases have been reported. In about half of patients, the subtelomere rearrangements appear to 

30 be de novo. The remaining half are inherited from transmission of an abnormal chromosome or 
chromosomes from a carrier parent. A sufficient number of patients with such rearrangements 
will have to be ascertained in order to identify common clinical findings; because of the 
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imprecise localization of currently available probes and the clinical variability seen in patients, 
and it is unlikely that it will be possible to diagnose specific chromosome imbalances based on 
clinical findings. Therefore, the only practical strategy for analyzing this group of patients is a 
comprehensive examination of all subtelomeric regions. After the abnormal subtelomeric region 
5 or regions are identified, the size of the imbalance (and the specific genes involved) could be 
further characterized by testing with a set of different probes derived from that terminal 
chromosomal band. 

For the few subtelomeric deletions that result in a specific set of clinical features that 
direct the diagnosis, a specific subtelomeric probe will be adequate to confirm the diagnosis. A 

1 0 set of probes for the specific subtelomeric region will delineate the size or length of the deletion 
that defines the specific clinical findings in a given patient. Several well characterized 
syndromes result from deletion of only a portion of a terminal chromosomal band include 
monosomy lp36 syndrome (chromosome lp deletion), Wolf-Hirschorn syndrome (chromosome 
4p deletion), Cri-du-chat syndrome (chromosome 5p deletion) and Miller-Dieker syndrome 

15 (chromosome 17p deletion). Nevertheless, patients with these syndromes have a constellation 
of clinical findings some of which are variable, depending on deletion size and other genetic 
factors including unmasking of one or more recessive genes. 

In addition, to the inherited or constitutional chromosome abnormalities, acquired 
chromosome abnormalities as observed in some cancers including leukemia can be surveyed with 

20 the subtelomeric probes to detect subtle rearrangements or to further characterize cytogenetically 
visible abnormalities. 

In another aspect of the present invention, a subtelomeric probe useful for detecting 
chromosomal rearrangements is provided. The probe generally comprises a single copy DNA 
sequence having a length of less than 25 kb and more preferably less than 10 kb wherein the 

25 sequence is capable of hybridizing to the terminal G-band or R-band of an arm of a single 
chromosome. When G-banding is used, the terminal band is light-staining and when R-banding 
is used, the terminal band is dark staining. Chromosome arms for this invention aspect include 
lp, lq, 2p, 2q, 3p, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, lOp, lOq, lip, llq, 12p, 12q, 
13q, 14p, 14q, 15p, 15q, 16p, 16q, 17p, 17q, 18q, 19p, 19q, 20p, 20q, 21p, 21q, 22p, 22q, Xp, 

30 Xq, and Yp. Exemplary probes are generally selected from the group consisting of 1- 3, 5-23, 
26-36, 38-57, 59-61, 63-67, 69-82, and 245-251. Preferably, the probe is within 8000 kb of the 
telomere of the chromosome. In this respect, exemplary probes include 1-3, 5-23, 26-36, 38-57, 
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59-61, 63-67, 69-82, and 245-251. More preferably, the probe is within 300 kb of the telomere 
of the chromosome. In this respect, probes selected from the group consisting of SEQ ID NOS. 
36, 80, 46, 47, 49, 51, 56, 248, 57, 78, 59, 75, 76, 74, 63, 250, 251, 66, 65, 67, 4, 3, 1, 9, 6, 11, 
10, 17, 20, 19, 18, 21, 81, 26, 29, 28, 31, 32, 43, 42, 41, 40, 44, 45, and 70 are preferred. 
5 Moreover, preferred probes are either labeled or modified to attach to a surface. 

In another aspect of the present invention, a method of developing single copy DNA 
sequence probes from subtelomeric regions of chromosomes is provided. The probes are capable 
of hybridizing to a single location in the genome of an individual and the method generally 
comprises the steps of searching the DNA sequence of the chromosome on a nucleotide-by- 

10 nucleotide basis beginning at the terminal nucleotide for a single copy interval of at least 500 
base pairs in length that is closest to said terminal nucleotide, identifying a single copy interval, 
synthesizing the identified single copy interval, and using the synthesized single copy interval 
as a probe. Preferred methods include the step of verifying computationally or experimentally 
that the identified single copy interval is represented at a single genomic location or where 

1 5 paralogous sequences are closely linked so that only a single signal is detected. In this respect, 
it is preferred that the single copy sequence is labeled. Additionally, it is preferred that the 
identifying step includes verifying both computationally and experimentally. Preferred methods 
of computational verification include using software to determine that the probe sequence is 
located at a single position in the genome. Preferred methods of experimental verification 

20 include rehybridizing the single copy probe to the chromosome and visualizing said probe on the 
terminal band and correct arm of the chromosome. Preferred single copy intervals are selected 
from the group consisting of SEQ ID NOS.l- 3, 5-23, 26-36, 38-57, 59-61, 63-67, 69-82, and 
245-251. The method may also include the step of preannealing the single copy probe with 
highly repetitive DNA. 

25 In yet another aspect of the present invention, a synthetic single copy polynucleotide for 

identifying chromosomal rearrangements is provided. The polynucleotide is preferably located 
within 8,000 kb of the terminal nucleotide of a chromosome and is capable of hybridizing to a 
single location on a specific chromosome when no chromosomal rearrangement has occurred. 
Preferred polynucleotides have a length of less than 25 kb and are found in the terminal G-band 

30 or R-band of said specific chromosome. Preferred polynucleotides are selected from the group 
consisting of SEQ ID NOS.l- 3, 5-23, 26-36, 38-57, 59-61, 63-67, 69-82, and 245-251. 
Particularly preferred polynucleotides are located within about 300 kb of the terminal nucleotide 
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of a specific chromosome. Particularly preferred polynucleotides include polynucleotides 
selected from the group consisting of SEQ ID NOS.36, 80, 46, 47, 49, 51, 56, 248, 57, 78, 59, 
75,76, 74, 63,250,251,66, 65,67, 4,3, 1,9, 6, 11, 10, 17, 20, 19, 18,21,81,26, 29,28,31,32, 
43, 42, 41, 40, 44, 45, and 70. It is preferred that the polynucleotides are either labeled or 
5 chemically modified to attach to a surface. 

In another aspect of the present invention, an oligonucleotide primer pair used for 
deriving single copy probes that can detect chromosomal rearrangements is provided. The 
primers are preferably selected from the group consisting of SEQ ID NOS. 83-244. 

In yet another aspect of the present invention, an improved synthetic DNA probe operable 

10 for detecting chromosomal rearrangements is provided. The probe includes a DNA sequence 
capable of hybridizing to a location on a chromosome arm. The improvement of the probe is that 
the probe has a length of less than 25 kb. Additionally, the improvement is that the probe is a 
single copy sequence with at least a portion of the probe being located closer to the end of a 
telomere on a chromosome than a clone selected from the group consisting of cosmids, fosmids, 

15 bacteriophage, PI, and PAC clones derived from half YACS. Preferably, the entire probe is 
located closer to the end of a telomere on a chromosome than the previously referenced clones. 
Preferred chromosome arms for this aspect of the present invention include an arm selected from 
the group consisting of 2p, 3p, 7p, 8p, lOp, lip, 16p, Xp, Yp, lq, 3q, 4q, 6q, 7q, 8q, 9q, lOq, 
12q, 13q, 14q, 15q, 16q, 17q, 18q, 20q, 22q, and Xq. Preferably the probe is located within 

20 8,000 kb of the terminal nucleotide of the telomere of a chromosome. Still more preferably, the 
probe is located within 300 kb of the terminal nucleotide of the telomere of a chromosome. In 
preferred forms, the probe is located in the terminal G-band or R-band of said chromosome. 
Preferred probes for this aspect of the invention include probes selected from the group 
consisting of SEQ ID NOS.46, 47, 49, 56, 78, 59, 64, 249, 2, 4, 3, 5, 9, 1 1, 20, 19, 21, 81, 246, 

25 70, 72, 73, 36, 80, 247, 50, 57, 75, 76, 74, 63, 250, 66, 65, 67, 1, 6, 10, 12, 16, 15, 13, 14, 17, 18, 
81, 245, 26, 31, 32, 43, 42, 41, 40, 44, and 45. 

In another aspect of the present invention, a method of screening an individual for 
cytogenetic abnormalities is provided. The individual should be diagnosed with idiopathic 
mental retardation based on a common set of clinical findings. Additionally, the individual 

30 should exhibit at least one clinical abnormality associated with idiopathic mental retardation. 
The method generally comprises the steps of screening the genome of the individual using a 
plurality of hybridization probes, wherein each of the probes has a length of less than about 25 



-24- 

kb, and detecting hybridization patterns of the probes, wherein the hybridization patterns will 
indicate cytogenetic abnormalities in the individual's genome. Preferably, at least one probe 
from each chromosome arm should be used in the assay. However, in some situations, only 
certain chromosome arms will need to be assayed because the clinical abnormality or the 
5 common set of clinical findings may be associated with a subset of the entire set of chromosome 
arms. The method may further include the step of associating the hybridization patterns with 
specific clinical abnormalities. Preferably, the probes are single copy probes meaning that they 
are either represented at a single genomic location or where paralogous sequences are closely 
linked so that only a single hybridization signal is detected. 

10 In another aspect of the present invention, a method of delineating the extent of a 

chromosome imbalance is provided. The method generally includes the steps of assaying a 
chromosome arm using a plurality of hybridization probes having a length of less than about 25 
kb, detecting hybridization patterns of the probes on the arm, and comparing the hybridization 
patterns with a standard genome map of the arm in order to delineate the extent of a chromosome 

15 imbalance. Such a method may be performed on a plurality of chromosome arms. The arm(s) 
assayed maybe selected due to a common set of clinical findings for the individual or the clinical 
abnormality may be associated with one or more arms. The method may further include the step 
of correlating imbalances on the arm with a medical condition. Preferred medical conditions 
include idiopathic mental retardation and cancer. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 
The patent or application file contains at least one drawing, in the form of photographs, 
executed in color. Copies of this patent or patent application publication with color drawing(s) 
will be provided by the Office upon request and payment of the necessary fee. 
25 Figure 1 is a series of twelve photographs depicting various probes hybridizing to specific 

chromosome locations on various chromosomes. These images are enlarged in Figures 2-13 ; 
Fig. 2 is a photograph of a 2.6 kb probe hybridizing to chromosome 5q; 
Fig. 3 is a photograph of a 2.5 kb probe hybridizing to chromosome 7q; 
Fig. 4 is a photograph of a 2.2 and a 2.4 kb probe hybridizing to chromosome 9q; 
30 Fig. 5 is a photograph of a 3.2 kb probe hybridizing to chromosome 13q; 

Fig. 6 is a photograph of a 3.8 and a 1.8 kb probe hybridizing to chromosome 14q; 
Fig. 7 is a photograph of a 2.6 kb probe hybridizing to chromosome 17p; 
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Fig. 8 is a photograph of a 2.5 kb probe hybridizing to chromosome 18q; 
Fig. 9 is a photograph of a 2.0 kb probe hybridizing to chromosome 19q; 
Fig. 10 is a photograph of a 2.6 kb probe hybridizing to chromosome 20p; 
Fig. 1 1 is a photograph of a 2. 1, 3.0 and a 3.7 kb probe hybridizing to chromosome 20q; 
5 Fig. 12 is a photograph of a 3.5 kb probe hybridizing to chromosome 22q; 

Fig. 13 is a photograph of a 2.5 kb probe hybridizing to chromosome Xq; and 

Fig. 14 is a photograph of a 2.3 kb probe hybridizing to chromosome 19q. 

Fig. 15 is a series of photographs of various probes localized on specific chromosomal 

arms; 

10 Fig. 16 is a schematic drawing of the structure of a chromosome end depicting the 

location of single copy probes in relation to the telomere; 

Fig. 1 7 is a schematic drawing of various gene locations in the 1 3q arm and their relation 
to a prior art probe and to a single copy probe in accordance with the present invention; 

Fig. 18 is a photograph of a single copy chromosome 18q probe (2530 bp in length) 
1 5 hybridized to a metaphase spread with an abnormal or derivative chromosome 6 and normal 
chromosome 18; and 

Fig. 19 is a photograph of two single copy subtelomeric probes for chromosomes 14q 
(1984 bp) and 3p (2093 bp) hybridized to normal metaphase cells. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
The following examples set forth preferred embodiments of the present invention. It is 
to be understood that these examples are provided by way of illustration and nothing therein 
should be taken as a limitation upon the overall scope of the invention. 

Example 1 

This example describes the process of developing single copy probes in accordance with 
the present invention. 



30 Materials and Methods: 

Development of subtelomeric single copy FISH probes for all human chromosomes and 
testing them by hybridizing them to normal human chromosomes. 
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Probe design. Probe sequences are designed and verified from the April 2001 , June 2002 
and November 2002 human genome drafts, and the Celera Genomics human genome sequence 
as described previously (Rogan et al, Sequence-Based Designs of Single-Copy Genomic DNA 
Probes for Fluorescence In Situ Hybridization, 11 Genome Research, 1086-1094 (2001) the 
5 contents and teachings of which are hereby incorporated by reference). The primary objective 
is to select single copy probes that recognize a single genomic location adjacent to the telomeres 
of each euchromatic chromosomal arm. This poses unique challenges for chromosomal termini 
that have evolved by paralogous duplication events. Paralogous non-allelic duplications are 
detected by comparing the sequences of target single copy intervals with the remainder of the 

1 0 genome. The BLAT server at the National Laboratory of Medicine is used to test for similarities 
to other non-allelic sequences in the public human genome draft, whereas the Celera sequence 
is searched locally on a Sun workstation using BLAST. Non-allelic sequence blocks of <500 bp 
in length and/or <80% sequence identity are not considered as potential sites for cross- 
hybridization, because such sequence similarities would not be detectable by FISH. 

15 Single copy intervals are sought within successive 100 kb intervals from each 

chromosome end. If a single copy interval of at least ~1 .8 kb in length can be located within the 
first 100 kb of subtelomeric sequence (and which does not computationally cross-hybridize 
elsewhere in the genome), then this interval is selected as a probe. Otherwise, adjacent lOOkb 
genomic intervals are searched for candidate single copy probe sequences until adequate 

20 probe(s) can be identified. The majority of the previously developed single copy probes are 
within 200 kb of the telomere. Although a longer chromosomal probe is generally desired, a 
probe of 1.5 kb can generally be developed from a 1.8 kb single copy interval and visualized by 
FISH. 

Probe generation, labeling and FISH. A single DNA fragment for each chromosomal 
25 region is amplified using long PCR procedures with Pfx-Taq (Invitrogen, Inc). Experimental 
optimization involved running a series of PCR reactions, each with a different annealing 
temperature bracketing the predicted annealing temperatures of the primers, to determine the 
highest possible temperature that produced a homogeneous-sized amplification product. 
Specificity was also optimized by varying the concentration of PCR enhancer solution according 
30 to the manufacturer's recommendations. If no amplification is achieved with a given primer set 
under a range of temperatures and enhancer concentrations, an alternative adjacent single copy 
interval is selected for probe development. The fragments are then isolated by conventional 
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techniques including column purification or gel electrophoresis to remove any potentially 
contaminating repetitive sequences and purified from low temperature agarose using Micro-spin 
columns (Millipore) or by preparative non-denaturing high performance liquid chromatography 
(Transgenomic, Omaha NE). The probe fragments are then directly labeled by nick translation 
5 using a modified or directly-labeled nucleotide (eg, digoxigenin-dNTP, fluorochrome-dNTP 5 etc). 
The labeled probes are denatured and hybridized to fixed, denatured chromosomal preparations 
immobilized on microscope slides. The probes are hybridized to chromosomes of two 
individuals according to conventional FISH methods (Knoll and Lichter, In Situ Hybridization 
to Metaphase Chromosomes and Interphase Nuclei , Current Protocols in Human Genetics, Vol. 

10 1, Unit 4.3 (eds. N.C. Dracopoli et al.) (1994) the teachings and content of which are hereby 
incorporated by reference). Probe hybridizations are detected by binding the labeled nucleotide 
with fluorescently-labeled antibody and viewing with fluorescence microscopy with appropriate 
filter sets. The total chromosomal DNA is counterstained with 4 , ,6-diamidino-2-phenylindole 
(blue) and the hybridized probe signals is visualized with fluorochromes. 

1 5 Validation. Each autosomal subtelomeric probe hybridizes to a homologous chromosome 

pair in normal female or male cells (2 signals are expected). Probes from X chromosomes 
hybridize to a single chromosome in male cells and to 2 chromosomes in females. Probes from 
the Y chromosome hybridize only to male cells. Parallel hybridizations on two different 
individuals are performed to confirm chromosome band location. Control hybridizations are 

20 performed in parallel with probes that have been previously validated. A minimum of 10 
metaphase cells are scored to determine hybridization efficiency for each probe. Generally, 
conventional FISH probes and single copy FISH probes have hybridization efficiency of at least 
90%, more preferably at least 92%, still more preferably at least 94%, still more preferably at 
least 96%, still more preferably at least 98%, and most preferably 100%. 

25 If a probe indiscriminately hybridizes to many locations on chromosomes, it most likely 

contains moderately to highly repetitive genomic sequences. Although the present repetitive 
sequence database is quite comprehensive and this pattern of hybridization is uncommon, it has 
been observed for a minority of probes. Such a result indicates a repetitive sequence family in 
the human genome that has not yet been characterized at the DNA sequence level. Based on our 

30 previous experience in designing single copy probes, only a minority of probes hybridize non- 
specifically to non-catalogued, interspersed repetitive sequence families that would be distributed 
throughout the genome. Probes with genome-wide cross-hybridization or cross-hybridization 
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to highly reiterated sequences can be preannealed to C 0 t 1 DNA. Cross-hybridization can be 
suppressed or eliminated by preannealing with highly repetitive (ie. C D tl) DNA. If the 
hybridization of single copy sequences within the probe is quenched, then an adjacent single copy 
interval is selected for probe development. 

5 

Characterization of probes that hybridize to more than one chromosomal region. 

In addition to highly-repetitive sequence families in probes that were designed to be 
single copy, we have unexpectedly observed a pattern of hybridization to a limited set of discrete 
loci on metaphase chromosomes, in addition to the chromosomal site from which the probe was 

10 designed. This hybridization pattern results when the probe contains complex, low-reiteration 
frequency sequences that are highly-related to sequences on other chromosomes or to other 
sequences on the same chromosome — these are known as paralogous sequences. This 
hybridization pattern may arise because the genome sequence is either inaccurate or not yet 
complete. The human genome sequence, however, is acknowledged to be incomplete, especially 

15 in regions containing heterochromatin. Paralogous copies of single copy sequences embedded 
within such regions are not likely to be comprehensively incorporated in the current genome 
draft. Other regions of the genome that have not been assembled completely or correctly are 
indicated in the draft by "gap" intervals. Paralogous or duplicate copies of single copy probes 
in these regions could also be responsible for unexpected hybridization to non-allelic loci. The 

20 software used to select probes is capable of detecting related genomic sequences in silico, 
however, as the genome sequence is not yet finished, there is always the possibility that a 
particular probe could anneal to other uncharacterized, related sequences on other chromosomes 
or the same chromosomes. If cross-hybridization to a discrete pattern of chromosomal loci is not 
suppressed by preannealing the original probe with highly repetitive DNA (eg. see results for 

25 chromosome 16 in Table 1), this indicates that the probe contains one or more paralogous 
sequences (ie. which are present at low copy) rather than a highly repetitive one. 



Table 1. Summary of subtelomeric scFISH probes validated by chromosomal 
hybridization 
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Approximate 


Actual 


Chromosome 
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Target 


Size 
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*cross-hybridization observed on other chromosomes. 

**cross-hybridization may be present; additional verification required. 

***cross-hybridization occurred despite C fl t1 suppression. 

A hybridization was detected when probe was combined with other 10ptel probes 

labeled with " A 

+ hybridization was detected when probe was combined with other 10ptel probes 
labeled with"*". 
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Assuming subsequent versions of the genome assembly are more accurate than the April 
2001 version, the probe sequence can be compared to more recent versions to determine if 
additional sequences related to the original probes are present in these versions. To identify 
paralogs, the probe sequence is compared with the genome drafts, allowing for a lower degree 
5 of sequence similarity to the duplicated copies. If the more recent genome sequence drafts reveal 
the presence of related sequences, two distinct strategies are available for producing 
chromosome-specific probes where paralogs are present in other bands on this or other 
chromosomes: (1) bisecting the probe - if the initial probe is sufficiently long - and 
reamplification of the non-paralogous region of the probe or (2) selecting a different single copy 

1 0 interval not containing any genomic paralogs for probe development. If a related sequence is not 
identified by sequence analysis, then internal primers are developed to bisect the original probe 
into sequences that are chromosome-specific. 

The original probe can be bisected to determine which component hybridizes to the 
multiple sites. Bisection of the product occurs by developing internal primers and possibly new 

1 5 end primers (with similar melting temperatures and GC composition) that result in two smaller 
products. These new products serve as probes for single copy FISH. If cross-hybridization 
remains after bisection, further dissection of the probe may be possible or a new single copy 
probe from the neighboring genomic interval is designed and assessed by FISH. 

After bisecting the original probe, one of two patterns of hybridization are expected. That 

20 is, one product is chromosome-specific and the other hybridizes to other chromosomal regions, 
or both products still show multiple sites of hybridization. The former pattern localizes the 
region that contains the repetitive or paralogous sequence, while the latter does not localize the 
region but rather indicates that the internal primer set spans the repetitive or paralogous sequence. 
To date, we can reliably visualize fragments that are 1500 bp or greater in length by 

25 fluorescence microscopy. Thus, when a probe is bisected, we endeavor to produce probes that 
are at least 1 500 bp. Shorter probes can also be combined that have a total target size of at least 
1500 bp. A probe has been developed with this procedure that detects only chromosome 4p 
terminal sequences by bisecting a larger probe that cross-hybridizes to paralogous sequences on 
other chromosomes. Alternative single copy intervals adjacent to the initial cross-hybridizing 

30 sequence are selected if the bisected probe cannot be designed to be at least 1 .5 kb in length or 
because of extensive paralogy to non-alleleic sequences that extend throughout the length of the 
probe sequence. 
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Ensuring that probes are close to the ends of chromosomes; and revising, as appropriate, 
probes closer to the chromosomal ends. 

The locations of the probes designed from the April 2001 genome draft are 
computationally compared to their locations on the more recent genome draft versions. If the 
5 position coordinates have shifted further from the end of the chromosome, then new single copy 
probes closer to the end of the chromosome, were designed from the April 2001 draft, 46 
subtelomeric probes that detect single copy targets were validated and an additional 36 
subtelomeric single copy probes have been designed from subsequent versions of the genome 
sequence and mapped. Development of new probes was contingent on the subtelomeric intervals 
10 being free of repetitive sequences and paralogs on other chromosomes. By developing probes 
as close to the ends of chromosomes as possible, we increase the likelihood of detecting terminal 
rearrangements that would not be evident using existing cloned probes. 

Results: 

15 Compared to conventional subtelomeric FISH probes, the subtelomeric single copy 

probes that we developed in accordance with the present invention detected smaller 
rearrangements of terminal sequence chromosomes (that result from deletion or unbalanced, 
cryptic translocations of these genomic regions) than was previoously possible. The present set 
of probes has been designed to detect all of the euchromatic sequenced subtelomeric regions. 

20 Primers have been designed and these primers recognize unique sequences within each 
subtelomeric region developed and validated as single copy probes for subtelomeric regions of 
chromosomes 1, 3, 5q, 7, 8, 9q, lOp, 11, 14q, 16q, 17, 19, 20q, Xp, and Yp. (See Table 2 ). 
Because these sequences are unique and the corresponding human genome sequence is publicly 
available, the primers themselves define one and only one product in the genome. Therefore, 

25 some of the primers listed in SEQ ID NOS 83-244 are equivalent to the products listed in SEQ 
DDNOS 1- 3, 5-23, 26-36, 38-57, 59-61, 63-67, 69-82, and 245-251. 



Table 



2. Primer sequences and locations 





Chromosome coordinate Range (forward, reverse primer )* 
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computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 










lptel 














Range 


17994_18023F, 20024_19995R 












Length 


2031 












Forward 


TCTGCGGCTGACCTGGCCTCCACGTCTCAC 


SEQ ID 1 


69.5 


69.65 






Reverse 


CTACCCGTCTCCCACCCCCTCTCCCCACCC 


SEQ ID 2 


69.8 




/o.Z 
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78.2 
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Range 


20726_20755F, 22139_22168R 












Length 


1433 


SEQ ID 3 










Forward 


CCCTAAACTCCTCCCTATCCCTTCTCAATC 
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59.05 






Reverse 


AAAAAAAACCTCATTTCCTCCCCAAAGC 
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59.0 
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Range 


2786 1 5828_2786 1 5 859F, 2786 1 789 1_2786 1 7924R 
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AGTTCCTAAACAACTATGAGCTAAAGTATCAG 
CTTTTAAGTGTGAAGAGTTAAGAAGTATCATGT 


SEQ ID 5 
SEQ ID 6 
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55.3 
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C 
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Range 


278592693_278592722F, 2785945 16J278594545R 












Length 


1853 












Forward 


TTGATGTTTATGTCCAGATTTTCTCTTCCC 


SEQ ID 7 


55.9 








Reverse 


GAATCTCAAAATGCTTAACTCCAAAACCAG 


SEQ ID 8 
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Range 


78433_78462F, 80517_80546R 












Length 


2114 












Forward 


CAGAGCATAGTCAAGAGAGGCGCATTTTCC 


SEQ ID 9 


61.4 


Z" 1 AC 

61.45 






Reverse 


AAGAGCCCCTAAATTAGCCCCGTAGAAACC 


SEQ ID 10 


61.5 




66.8 
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66.1 
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Range 


61604_61634F, 64223_64256R 












Length 


2653 












Forward 


GCAAAGACAATGCAAAAAACACTTTACATGG 
GCCTGATATAGGTATATTCAGAGAGCTACAGAA 


SEQ ID 1 1 
SEQ ID 12 


57.6 


57.6 






Reverse 


G 
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61.8 
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2qtel 














Range 


247 1 0 1 356_247 1 0 1 385F, 247 1 04869_247 1 04899R 












Length 


3544 












Forward 


ACTCCCTTTTGGATAATCAAAATGCTCAAC 


SEQ ID 13 


56.7 


56.7 






Reverse 


GCAAAATTACCTTTCAAATGTGTACTTGCTC 


SEQ ID 14 


56.7 




61.8 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimized 


Band 


Sequences of Primer Pair 










Optimal 














Tm 












Ol.( 


4qtel 














Range 


200662680_200662709F, 200664537_200664508R 












Length 


1857 












Forward 


TTGAAATATGGTACAAAGAAGGGGTTGGAG 


SEQ ID 1 5 


57.3 


57.35 






Reverse 


CTTGAAGTCCTTGCCGAAGAAAAATAGTTG 


SEQ ID 16 


57.4 




OH.O 




Optimal 














Tm 












04. C 


4qtel 














Range 


2006576 14_200657649F, 200660008_200660039R 












Length 


2426 












Forward 


GCTGACTCAAGAACTGTAGCATTGAGTGTAAG 


SEQ ID 1 7 


59.5 


Jy.J 






Reverse 


GGGGAATGCAAGCATATTATATGAGCAGAAGG 


SEQ ID 18 


59.5 




f\A f\ 
OH.O 




Optimal 














Tm 












A/l ^ 
OH.C 


5qtel 














Range 


19520001 1_195200041F, 195202642_195202671R 












Length 


2661 












Forward 


GCAAAGGACCTCTTTAATGCTTATCAGCCAC 


SEQ ID 19 


60.1 


OU.15 






Reverse 


GGTGAGAGCTATGGAAAGCCTCTCCTATTG 


SEQ ID 20 


60.0 




OD. O 




Optimal 














Tm 












OO.c 


Sqtel 














Range 


1 95 1 86729_195 1 86760F, 1 95 1 89493 J 95 1 89523R 












Length 


2795 












Forward 


TTCCAGCCCCACCTGCTCAGGCAGCCTCTATG 


SEQ ID 21 


68.7 


0O.4 






Reverse 


GCCAGCACAGCCTCCTGTCTTAGCCCTGTCC 


SEQ ID 22 


68.1 




7S S 

/J.J 




Optimal 












7^ * 


Tm 












Sqtel 














Range 


195129480^195 129509F, 195131860_195131889R 












Length 


2410 












Forward 


GCGAGAAATGCCTCCCTATTCCCCAGGAGC 


otiQ ID Zo 


65.3 


6A £^ 
OO.OD 






Reverse 


TCCCAGAACTTTGCCTGTTGCCCATGCCAC 


SEQ ID 24 


66.2 




68.1 




Optimal 














Tm 














7ptel 














Range 


20273_20302F, 23115J23144R 












Length 


2872 












Forward 


AGCAGCTCCAGAGCAGGGAACCCACCTCAC 


oLQ ID ZJ 


67.8 


u / .o 






Reverse 


GTGTCCACACCAGGCAGCGTCCAACTCAGC 


SEQ ID 26 


67.8 




72.1 




Optimal 












72.1 


Tm 












7qtel 














Range 


163817881_163817910F, 163821021_163821050R 












Length 


3170 












Forward 


ATGAGGGAGGAGTGGGGAGAGGAAGTGAAG 


SEQ ID 27 


63.3 


63.1 






Reverse 


ACTACCTGGTGTCCAGTACCCAAATCCAGC 


SEQ ID 28 


62.9 




68.5 




Optimal 












68.f 


Tm 












7qte! 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 










Range ! 


1 63771088^163771 1 17F, 163773632_163773661R 












Length 


2574 












Forward 


CCCTCTTTCTGAACACCCCCCGGCAGACAC 


SEQ ID 29 


66.9 


66.3 






Reverse 


TGGCAGGCTGTCCTGGTCGTATTCGAGGTC 


SEQ ID 30 


66.1 




Oi.o 




Optimal 














Tm 














8ptel 














Range 


163906_163935F, 165984_165955R 












Length 


2079 












Forward 


TCTGCTCTCCTGTGCCAAGCGTCAATATGG 


SEQ ID 3 1 


63.7 


/CI Q 






Reverse 


ACCTCTCTGGGTCTCTCTCCTCCTCACTG 


SEQ ID 32 


64.1 




£R 1 
Oo. 1 




Optimal 














Tm 












/TO 1 

68.1 


8ptel 














Range 


131014_131044F, 133255_133284R 












Length 


2271 












Forward 


GCATTTCTCAGAATAATGAATGGCAGGAAATAC 


SEQ ID 33 


57.5 


57.6 






Reverse 


GTGCATGTTTCAAGACATTCTCAGATTGTG 


SEQ ID 34 


57.7 




Ol.O 




Optimal 














Tm 












/c 1 t 


9ptel 














Range 


190285_190314F, 192338_192367R 












Length 


2083 












Forward 


CAAGTTGGTAAATGGAGGCATTATATGGAG 


SEQ ID 35 


56.3 


56.3 






Reverse 


AGTCACGTATCAAGTGGAAATAAAATCGTC 


SEQ ID 36 


56.3 




Ol.O 




Optimal 












/c i i 


Tm 












9qtel 














Range 


141875348_1418775377F, 141878207_141878236R 












Length 


2889 












Forward 


ACAACAGGACAATGCATACAACCACGAAAC 


bbQ YD 5 / 


60.4 








Reverse 


TCATTAGAATGAAAGGGAGCCACAGAGCAG 


SEQ ID 38 


60.3 




(\fs ft 

uu.o 




Optimal 














Tm 












OO.l 


9qtel 














Range 


141889106_141889135F, 141891306_141891337R 












Length 


2232 












Forward 


AGCTCCAGGTAACTCTCAGGCCAGCAGCCC 


SEQ ID 39 


67.6 


67.55 






Reverse 


AAGGAGGAAGTGGAAGCTCAGCCCAGGCAGTG 


SEQ ID 40 


67.5 




72.1 




Optimal 












72 J 


Tm 












9qtel 














Range 


141878644_141878674F, 141881 106^141881 140R 












Length 


2497 












Forward 


TGCTGACCGAGCACATACACAATTCAGTGAC 
AGGGTCTCTGCTAACGTAGTGAAAATACGCAAA 


SEQ ID 41 
SEQ ID 42 


62.6 


62.3 


63.2- 




Reverse 


TG 




62.0 




68.5 




Optimal 












63.2- 


Tm 












68.5 


9qtel 














Range 


141871749_141871778F, 141874426_141874455R 












Length 


2707 
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Chromosome coordinate Range (forward, reverse primer )* 


T 














Experimer 




Length 


computed 




t-ally 








T ra predicted 


Optimized 


Band 


Sequences of Primer Pair 






T m 


Forward 


CTGAGCAGCCACCCTGGATGCTCCTGCACG 


cpn TD A1 


05. 9 








Reverse 


CTCTGGCCCTCGGCCCATTGCCACCTCAAC 


SFO TD 44 






o*+.o 




Optimal 














Tm 












04.C 


9qtel 














Range 


1 4 1 897247_1 4 1 897276F, 1 4 1 899495_14 1 899524R 












Length 


2278 












Forward 


ACAGAAGCAAGCAGAAGTACAGAACCAGAG 


sfo td 45 


CC\ A 
0U.4 


OU.4j 






Reverse 


TTTCTCCCTCCTAGATGATCGACTTGGGAC 


SFO TD 46 


OU..> 




58 4. 




Optimal 














Tm 












JO/ 


9qtel 














Range 


141928044_141928073F, 141930725_141930750R 












Length 


2711 












Forward 


CACCATCTGCATCTTACATCTTATTCCACC 


dea^ YD 4 1 


C7 O 

57.8 


57.75 






Reverse 


AAGTTAATTGGAGGGAAATGGCTGTAAAGG 


CEA jy\ AQ 
oHl^ LU 4o 


57.7 




Ol .O 




Optimal 














Tm 












A1 $ 
01. c 


lOptel 














Range 


230747_230779F, 232879_232848R 












Length 


2132 












Forward 


GAGTTAAGCTCAGCTCACTCTGTGGCACTACC 


cr;n tt-v aq 

oHv TU 4V 




64 






Reverse 


GGAAGTGTCTGTGGTTTGCCAGCTCCTGTTCT 


oJ2V^ 11./ D\J 






04 




Optimal 














Tm 












O* 


Range 


185297_185326F, 187348_187319R 












Length 


2051 












Forward 


GATTCTGACCCTTGCCCAGCCTACGTCTCG 


cpO TD SI 




04 






Reverse 


TGACCCACAATCTTTCCCTTCTGGCACCAC 


SFO TD 52 






o*t 




Optimal 














Tm 














Range 


201244_201278F, 204448_204479R 












Length 


3203 












Forward 


GATGTTTCTAACTATACCTTTATGTGTTTTTCCT 


SFO TD 51 




S7 






Reverse 


GCTCTTCCTACCAAGTTATCTTCATCTATTCG 


SFO TD 54 






D / 




Optimal 














Tm 












5: 


Range 


20032_20062F, 22558_22527R 












Length 


2526 












Forward 


CCAGATACTGGTCTCATTCTTGGGCAGTTTC 


SFO TD 55 




01 






Reverse 


CCGAGTTTGACTTTCACTCACTCACCTAGATG 


SFO TD 56 






O I 




Optimal 














Tm 












lOqtel 














Range 


144785104_144785133F, 144786894_144786923R 












Length 


1820 












Forward 


AATGAAAGGGATACGTTTGCGTCTGTCCTG 


SEQ ID 57 


61.1 


61.05 






Reverse 


GGTAAAGTTCTTCCCCTGGCTCTTCACAAC 


SEQ ID 58 


61 




66.8 




Optimal 












66.} 


Tm 












lOqtel 














Range 


144752659_144752688F, 144756387_144756416R 













-37- 





Chromosome coordinate Range (forward, reverse primer )* 


T 

A m 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 










Length 


3758 












Forward 


ATTTTAGTGAAGAAACTTGCTGTGGAGTCG 


SEQ ID 59 


58.1 


CO AC 

58.05 






Reverse 


AAGAAGAAGGAAAGAACAAGAAAAGCCCAG 


SEQ ID 60 


58.0 




00. o 




Optimal 














Tm 












OO.O 


lOqtel 














Range 


144746646_144746677F, 14475 1955_144751985R 












Length 


5340 












Forward 


CCACACCCAGCCAACAGCAGACGTGATGGAAG 


SEQ ID 6 1 


67.2 


i 

O /.I 






Reverse 


CTGAGGAGACAGGTGGGACAGAGGGGCAGAC 


SEQ ID 62 ; 


67.0 




£8 1 
Oo. 1 




Optimal 














Tm 












Oo. 1 


llptel 














Range 


16421_16450F, 1 9275_1 9304R 












Length 


2884 












Forward 


GCTCCTCCCCACACCTGACCCTGCCCTCAC 


SEQ ID 63 


69.4 


/CO A Q. 

09.45 






Reverse 


GAGCTGGCCCGTTTTGCCACCTGTCACCCC 


SEQ ID 64 


69.5 




/D.J 




Optimal 














Tm 














llqtel 














Range 


1 50509268_1 50509297F, 1 505 1 1 700_1 505 1 1 729R 












Length 


2462 












Forward 


CAACCCGAGAGATGAGCCCTGCGTCCACTG 


SEQ ID 65 


66.9 


66.5 






Reverse 


CACCTGCGTCTTCAAGCCCTAATGGGCACC 


SEQ ID 66 


66.1 




/z. 1 




Optimal 














Tm 












12.. 1 


llqtel 














Range 


150528401_150528430F, 150530884_150530913R 












Length 


2513 












Forward 


AATGAAGAAATGAATCTCTCTCCTTGGACG 


SEQ ID 67 


57.2 


57.1 






Reverse 


TTTATCATGTGGCAGGCAATTAAATGACAG 


SEQ ID 68 


57.0 




61.8 




Optimal 














Tm 












ol.i 


12ptel 














Range 


159378_159407F, 161259_161291R 












Length 


1914 












Forward 


GTGTCCCCAGGCAGAGTTAAGAAAAGAAGC 
GCAGGAGTGAAACAACAAAAAATACAGCCAGT 


SEQ ID 69 
SEQ ID 70 


61.2 


61.15 






Reverse 


C 




60.9 




oo.o 




Optimal 














Tm 












OO.c 


12ptel 














Range 


186089_186118F, 189015_189044R 












Length 


2956 












Forward 


TACTCCTTCCTTCCTTCCCTCAACCCTGAC 


SEQ ID 71 


62 


62 






Reverse 


TTTGGGCAGAGTGTGGATGGAGAAGATTGG 


SEQ ID 72 


62.0 




68.5 




Uptimal 














Tm 












68.i 


12qtel 














Range 


146323815_146323844F, 146327241_146327270R 












Length 


3456 












Forward 


TTCAGAAGGTAGAGTTGGAGGATCATAGGC 


SEQ ID 73 


59.1 


59.2 







-38- 





Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optim izec 


Band 


Sequences of Primer Pair 








T m 


Reverse 


TCCCCACAGAGTAAACAGTAGGAAGGAAAG 


SEQ ID 74 


59.3 




0 1 .0 




Optimal 












/CI c 


Tm 












12qtel 














Range 


146336097_146336127F, 146338576_146338607R 












Length 


2511 












Forward 


CACAAAAAGATTAAAACACAATCTTGTGAGC 


SEQ ID 75 


55.5 


55.5 






Reverse 


ACTCATCCTTTATTCTTCTAGTAAGAATTGCC 


SEQ ID 76 


55.5 




55.5 




Optimal 














Tm 












55.f 


13qtel 














Range 


1 18776702_1 1877673 IF, 1 1 877988 1_1 1877991 OR 












Length 


3209 












Forward 


TGCCTGCTGACTGAGGGGGATGGCCGGAAC 


SEQ ID 77 
SEQ ID 78 


69.6 


69.65 


64.6- 




Reverse 


GGCTGTGGGTGTGCGGGATAGGGGAGGCTC 




69.7 




/D.D 




Optimal 












64.6- 


Tm 












/D.D 


13qtel 














Range 


1 18764062_1 18764091F, 1 18767129_1 18767158R 












Length 


3097 












Forward 


TCCTTGCTGCACTACCTACCCATGCAGGCG 


SEQ ID 79 


66.8 


66. 5D 






Reverse 


GGTCACCGGGAGGAAGCCACACATCTGACG 


SEQ ID 80 


66.9 




0*f .O 




Optimal 














Tm 












14qtel 














Range 


106231822_106231855F, 106234034_106234063R 












Length 


2242 












Forward 


TCTTAGAACATGTGACAGAATCAAAAAATTCC 


SEQ ID 81 


55.4 


55.35 






Reverse 


TTTAAGAGAATGAAAGTCATACCTGTAGCC 


SEQ ID 82 


55.3 




58.4 




Optimal 












58/ 


Tm 












14qtel 














Range 


1 062 1 9634_1 062 1 9663F, 1 0622 1499_1 0622 1470R 












Length 


1866 












Forward 


TTTCAGACGGTCGAGTGACAGTCCAAACGG 


SEQ ID 83 
SEQ ID 84 


63.7 


63.75 


63.2- 




Reverse 


GGAGGCTCTGCTTTCCAGCCAGATGTAAGG 




63.8 




71.8 




Optimal 












63.2- 


Tm 












71. o 


14qtel 














Range 


1 06 1 92496_1 06 1 92527F, 1 06 1 96305_1 06 1 96334R 












Length 


3839 












Forward 


GCATACATCTCCGACACTAGGAAAGACACGAC 


SEQ ID 85 
SEQ ID 86 


61.9 


62.3 


63.2- 




Reverse 


ATTGGCCTTTCAGCTTGCCCAAACACAAAC 




62.7 




68.5 




Optimal 












63 2- 


Tm 












68.5 


15qtel 














Range 


100651272__100651303F,100653622_100653593R 












Length 


2351 


SEQ ID 87 










Forward 


CTTAAAATATCCAGTCTCAGTTTTGTTTCCTC 


55.3 


55.25 







-39- 





Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimized 


Band 


Sequences of Primer Pair 










Reverse 


TTAAATGCAACTCAAAAGAAGAAAGGTCTC 


SEQ ID 88 


55.2 




£1 8 
Ol.o 




Optimal 












61 $ 
0 1 .1 


Tm 












15qtel 














Range 


100655884_100655914F, 100657490_100657461R 












Length 


1607 


SEQ ID 89 


56.6 


56.6 






Forward 








Reverse 


CTAAAACCCATAAATTGACCGAACACTCTC 


SEQ ID 90 


56.6 




61.8 




Optimal 












61.* 


Tm 












15qtel 














Range 


100596963_100596992F, 100598878_100598844R 












Length 


1916 












Forward 


GGGATAGATGATGGTTTGTTGTAATTTGAG 
GTCTCTAGATAATCTAATAATATCCACTTCCCAA 


SEQ ID 91 
SEQ ID 92 


55 


55 






Reverse 


G 




55 




J J.J 




Optimal 














Tm 












< 

jj.. 


16ptel 














Range 


17530_17560F, 23932_23961R 












Length 


6432 












Forward 


GCCACGCACTTCCCTGCTGTTTGAAAGACCC 


SEQ ID 93 


66.6 


66.45 






Reverse 


GTGTTTGTCACCCCACTCCTGCTCCTGCCC 


SEQ ID 94 


67.3 




10 1 
/ Z.l 




Optimal 














Tm 












"70 1 


16ptel 














Range 


24259J24288F, 29479_29508R 












Length 


5250 












Forward 


GTGTCGGTTCTCCACCACCACGATGAGCCC 


SEQ ID 95 


67.1 


oo.y 






Reverse 


TCCCGCCTAGCAGAGTTGCTGTCTGGCAAG 


SEQ ID 96 


66.7 




68 1 

OO. 1 




Optimal 














Tm 












68 1 
05. 1 


16qtel 














Range 


1 02 1 68227_1 02 1 68256F, 1 02 1 70764_J 02 1 70793R 












Length 


2567 












Forward 


AGTTCTCTGCTTCTTCCTTGTTTTCTCTCC 


afcQ ID 9 / 


58.7 


6 

jo.O 






Reverse 


TCCCTTTTTGCTTCTCTGTGTTGTGATTTC 


SEQ ID 98 


58.5 




61 8 




Optimal 














Tm 












61 5 


17ptel 














Range 


589547_589576F, 5921 10_592139R 












Length 


2593 












Forward 


TCGGATAAAAGCAGAAGCAGAGAGAGCAGG 


SEQ ID 99 


61.7 


62.2 






Reverse 


AGCCCCCTCCTAAAGGCTGTCACCTATAAG 




62.7 




68.5 




Optimal 












DO.. 


Tm 












l/ptel 














Range 


55469 1_554720F, 559645_559674R 












Length 


4984 












Forward 


ATCCTTTCCTTTTTTGCCTTCTTCCTCATC 


SEQ ID 101 


57.9 


57.95 






Reverse 


CTTCTTTCCTCCCCATCTTCTCCTTCTTAG 


SEQ ID 102 


58 




58.4 




Optimal 












58.' 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 








T m 


Tm 














17qtel 














Range 


8833703 1_889337060F, 88339899_88339928R 












Length 


2898 1 












Forward 


GACAGGTTGGGGATCTAGAGAGCTGGGGAG 


SEQ ID 103 


63.8 


63.8 






Reverse 


AAAGGGGGTGTTAGTGAGGGGCCACAAAAG 


SEQ ID 104 


63.8 




71 o 
/ l.O 




Optimal 














Tm 












/ 1 .C 


17qtel 














Range 


88342552_88342581F, 88345577_88345548R 












Length 


3026 












Forward 


GCAATCAGATTTCTCTCAAACCACGAACAC 


SEQ ID 105 


59.1 


59.1 






Reverse 


TTTATCAGGATATGCGTTTTCCTCCAACCC 


SEQ ID 106 


59.1 




66.8 




Optimal 














Tm 












66.1 


18ptel 














Range 


344433_344465F, 346559_346529R 












Length 


2127 

CCTTAACAAACAAACAGAAAAAAAAGAAAGGA j 


SEQ ID 107 










Forward 


G 




55.6 


55.6 






Reverse 


AGTCCCAATATTTGAACCTAAATGCAAAAAG 


SEQ ID 108 


55.6 




CS A 




Optimal 














Tm 












CO/ 

Do/ 


18ptel 














Range 


335360_335389F, 337727_337697R 












Length 


2368 












Forward 


ATCTTGTTGCATCCTGAGAGAAACAGAATC 


SEQ ID 109 


57.6 


57.6 






Reverse 


CAGGCATCTACTTGAGAACTGACAAACTAC 


SEQ ID 110 


57.6 




Ol.o 




Optimal 














Tm 












0 1 .< 


18qtel 














Range 


83 822245_83 822274F, 83 824743_83 824774R 












Length 


2530 












Forward 


TGAGAATGTGATTGCCGTTCTGAAAACACC 


SEQ ID 1 1 1 


60.2 


oO.lo 






Reverse 


TCTTTTCTGTGTGCTTGATTCTTGCAGATACAGC 


SEQ ID 1 12 


59.9 




OH-.D 




Optimal 














Tm 














19ptel 














Range 


575_604F, 2360_2389R 












Length 


1815 












Forward 


GGAGAAGGGGAGTTTGCTGGGGAGACGAGG 


SEQ ID 113 


66.2 


66.05 






Reverse 


ACACAATGGAAACAATGGGGAGGGTGGGCG 


SEQ ID 114 


65.9 




72.1 




Optimal 














Tm 












72. 1 


19ptel 














Range 


24323_24352F, 26382_26416R 












Length 


2094 












Forward 


ACCTGCCCTGCCACCTCTGTTCTCCCTGCC 
CGCCTTTGAGTCAACCAAGCCCCAAGATGCACA 


SEQ ID 115 
SEQ ID 116 


69.4 


68.95 






Reverse 


CC 




68.5 




61.8 




Optimal 












6U 


Tm 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
























Experimer 




Length 


computed 








t-ally 












, predicted 


Optimized 


Band 


Sequences of Primer Pair 












T 


19ptel 
















Range 


55302_55331F, 59926_59955R 














Length 


4654 1 














Forward 


ACCACTAAGAGCCCCTGTCACCCTCCAGCC 


SEQ ID 117 


67.2 


67.35 








Reverse 


TTCCCCATTCCCCAGTCCAACACCCCCTCC 


SEQ ID 118 


67.5 






72.1 




Optimal 
















Tm 














72.1 


19qtel 
















Range 


723 1 8330_723 1 8359F, 7232 1 02 1J7232 1 050R 














Length 


2721 














Forward 


CAGATGGAGACACTCTCCCTGGGAAATGCC 


SEQ ID 119 
SEQ ID 120 


63.4 


63.3 




68.5- 




Reverse 


TTTTGCCTTCCTGCTGCATGACCAGCTAAC 




63.2 






71.8 




Optimal 














68.5- 


Tm 














"7 1 © 
/ l.O 


19qtel 
















Range 


7235 141 8_72351447F, 72353787^723538 16R 














Length 


2399 














Forward 


CTCTCTGCTCCACCTCTGGCTTTGACGACG 


SEQ ID 1 2 1 


65.3 


65.25 








Reverse 


AGACTGCCTCCCCTCCCCTAACCCAGAATG 


SEQ ID 122 


65.2 






64.6 




Optimal 
















Tm 














04. 0 


20ptel 
















Range 


356009_356039F, 358594_358624R 














Length 


2616 














Forward 


AGTGCCCAGGAAAGACCAGGAAAATACAAG 


SEQ ID 123 


61 


60.75 








Reverse 


GGGAAATAGTAGCGTAAGCTGTCAACTCCAG 


SEQ ID 124 


60.5 






66.8 




Optimal 
















Tm 














66.* 


20ptel 
















Range 


400061 400095F, 402116 402148R 














Length 


2088 

TTCCATTTCCTGCCATCTAAGCAATGCAGACACA 


SEQ ID 125 












Forward 


G 


SEQ ID 126 


63.7 


63.7 




63.2- 




Reverse 


TGGACTGCTTGCTGGTCGCTTACATCACTTTAC 




63.7 






68.5 




Optimal 














/CIO 

63.2- 


Tm 














05.J 


20qtel 
















Range 


64760349 J54760378F, 64762696_64762667R 














Length 


2348 














Forward 


TCAGAGGGGGGCTGGACATTGAATGTGAAC 


CCA TTN I 0*7 


63.5 


63.3 








Reverse 


GTCACCATAGGACACAGACAGGAAGTGGGG 


SEQ ID 128 


63.1 






OO.D 




Optimal 
















Hp 

Tm 
















20qtel 
















Range 


64754684^647547 13F, 64759763_64759734R 














Length 


5080 














Forward 


TAGAAATAACGACCAAAAGCCTCCCCTGTG 


SEQ ID 129 


60.4 


60.4 








Reverse 


TTCAAGCTGTCAGGGACATCATGTTGAGAG 


SEQ ID 130 


60.4 






66.8 




Optimal 
















Tm 














66.8 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 








T m 


20qtel 














Range 


64751 135_64751104F, 64754267J54754238R 












Length 


3133 












Forward 


TTTGTATGTTATTACCCTCGTTGTGCCATC 


SEQ ID 131 


57.9 


57.85 






Reverse 


TCTCAGCCTCAGAAAATGCTTATGTTGAAG 


SEQ ID 132 


57.8 




64.6 




Optimal 














Tm 












64.( 


20qtel 














Range 


64745597_64745626F, 6474929 1_64749262R 












Length 


3695 












Forward 


TTTTTTCCCTCCTGGCCTCACTCTTGCAAC 


SEQ ID 133 
SEQ ID 134 


62.7 


62.8 


68.5- 




Reverse 


ATAGAAGGAAGCAGGACAACGGGGACAGAC 




62.9 




71.8 




Optimal 












68.5- 


Tm 












71.8 


20qtel 














Range 


64737952_6473798 IF, 647403 66_647403 3 7R 












Length 


2415 












Forward 


CGGAAGTCAACAGTCACTGACGAGTCGGAG 


SEQ ID 135 
SEQ ID 136 


63.6 


63.6 


68.5- 




Reverse 


AGAGTATAGGGACCAGCAGGAACACGGAGG 




63.6 




"7 1 o 

71. o 




Optimal 












68.5- 


Tm 












71.8 


20qtel 














Range 


64733540_64733569F, 64736582_64736553R 












Length 


3043 












Forward 


GCACCAGCCCTTACCTTCCTCCCTTCACAG 


SEQ ID 137 


65.1 


65.05 






Reverse 


ATATGGTAGGTGCTCACCACATGCAGGCCC 


SEQ ID 138 


65 




/Z. I 




Optimal 














HP 

Tm 












IZ. 1 


20qtel 














Range 


64728344_64728373F, 647331 12J54733083R 












Length 


4769 












Forward 


CCTTTCTCTACACCCTCCCACCTGCTGCTC 


SEQ ID 139 


64.7 


64.25 






Reverse 


CACCCACCTCTCCCTGCCTCTAGTCTCTTC 


SEQ ID 140 


63.8 




Oo. 1 




Optimal 














1m 












<Q 1 
OO. 1 


20qtel 














Range 


6472 1 595 J5472 1 624F, 64723760 J54723 73 1 R 












Length 


2166 












Forward 


CCCTACCCCAGATCCTGAGGATTCACATAG 


SEQ ID 141 


60.6 


60.6 






Reverse 


GGGACAGTCAGAAACATCTCTGAAACCCTG 


SEQ ID 142 


60.6 




66.8 




Optimal 














Tm 












66.1 


20qtel 














Range 


64674392_64674424F, 646773 88_646773 54R 












Length 


2997 












Forward 


GCTCAGTGCTCTCCCGCTCTCCTGCTTCTCTTC 
ACTCAGCCTCTAATCAGCCTCTCTGCTCCACCCA 


SEQ ID 143 
SEQ ID 144 


67.3 


67.3 






Reverse 


C 




67.3 




75.5 




Optimal 














Tm 












IS.i 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 








T 


21qtel 














Range 


44855249_44855278F, 44859589^448596 18R 












Length 


4370 












Forward 


TAATGTATGCCCACAAATCTCCAGCGACCC 


SEQ ID 145 
SEQ ID 146 


62.2 


62.15 


68.5- 




Reverse 


TCCAGCACCATCTCTGAACAACTACATGCC 




62.1 




-7 1 Q 




Optimal 












68.5- 


Tm 












71.8 


21qtel 














Range 


44876898_44876927F, 44878730_44878759R 












Length 


1862 












Forward 


TCTAAGACCAAGTCGCTACACTCTTAACTG 


SEQ ID 147 


58 


58 






Reverse 


CTTCTTTCAACCATAAAAGCCTTCCTCCTC 


SEQ ID 148 


58 




66.8 




Optimal 














Tm 












66.1 


22qtel 














Range 


47577168_47577197F, 47580377_47580406R 












Length 


3239 












Forward 


TTCAGCGCCAGCCTCTTCGCTCCGTCCAAG 


SEQ ID 149 
SEQ ID 150 


68.6 


68.7 


64.6/72. 




Reverse 


TGGTCAGGTGTGGGTCAGGAGACCCCAGCC 




68.8 




i 




Optimal 












64.6- 


Tm 












72.1 


22qtel 














Range 


47584046_47584075F, 4758636 1_47586390R 












Length 


2345 












Forward 


GGGTCTCACATGTAGCATTCCTGGGCACAC 


SEQ ID 1 5 1 


64.1 


/ZA 1 

64.1 






Reverse 


GTCCTCCCATTCCCATCCCTATCCCCACTG 


SEQ ID 152 


64.1 




7? 1 




Optimal 














Tm 












72.1 


22qtel 














Range 


47593223_47593252F, 47596743_47596772R 












Length 


3550 












Forward 


CAGGTAAGGGAGATGAGACCTCCAGACAAC 


SEQ ID 153 


61.1 


01.2 






Reverse 


CCAAATACAGACACAGCCTCAACCCCATTC 


SEQ ID 154 


61.3 




Ou.o 




Optimal 














Tm 












66. l 


Xptel 














Range 


124934_124963F, 1 26829_1 26800R 












Length 


1896 












Forward 


CGCAGGAAATAGGCAAACACACACTGGAAG 


or?r\ tt~v ice 

SEQ ID 155 


62.0 


0 1 .iO 






Reverse 


GGACCCTACACTGGATGGGTTTTAGCAGTC 


SEQ ID 156 


61.9 




68.5 




Optimal 














Tm 












Aft 1 


Xqtel 














Range 


1 57753803_157753832F, 1 57756302^1 5775633 1R 












Length 


2529 












Forward 


ATCCACAGCTTTGATCTAGGGAAAATAAAC 


SEQ ID 157 


56 


56.15 






Reverse 


TGTGTTGGAAATGCAACTTAAATTGAACTG 


SEQ ID 158 


( 56.3 




61.8 




Optimal 














Tm 












61.1 


Yptel 
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Chromosome coordinate Range (forward, reverse primer )* 


T 
















Experimer 




Length 


computed 




t-ally 










T m predicted 


Optimizec 


Band 


Sequences of Primer Pair 








T m 


Range 


66941_66970F, 69357J59386R 












Length 


2446 












Forward 


TATAGACACGTGACAAAGTAGCTGAAAGACC 


SEQ ID 159 


56.6 


56.45 






Reverse 


TCTGTTTCTGTGTATGACTGCAATTTAACC 


SEQ ID 160 


56.3 




61.8 




Optimal 














Tm 












61.* 


Yptel 














Range 


72392_72421F, 74362_74391R 












Length 


2000 












Forward 


CATGCTAAATTCATGGGCCATATTTTCAAC 


SEQ ID 161 


56.3 


56.3 






Reverse 


GATGCAAAATGTTCATCTCACATCACAATC 


SEQ ID 162 


56.3 




61.8 




Optimal 














Tm 












61.* 



"coordinates from the April, 2001 version of the human genome draft sequence; F: coordinates of forward primer, R: coordinates 
of reverse primer 
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Potential probes are densely arrayed across the terminal chromosomal region and 
coordinates are precisely defined. The probes of the present invention span a range of distances 
from the telomere of each chromosome arm, generally within the terminal bands of each 
chromosome. Using individual single-copy probes or these probes in combination, it is possible 
5 to delineate the size of the chromosomal region that is involved in the rearrangement with high 
precision, ie. the length of a gain or loss, the location of a breakpoint of chromosomal 
translocation or inversion. 

Alterations in the short or p-arms of chromosomes 13, 14, 15, 21 and 22 and the long or 
q- arm of the Y chromosome do not appear to contribute to clinical abnormalities. These regions 

1 0 are comprised predominantly of repetitive sequences and their complete sequences have not been 
determined. Therefore, probes for these regions were not developed, however, if these 
chromosomal arms are found to contain unique single copy sequences, the present invention 
provides a method of developing probes for these regions and applying them. 

Table 2 summarizes results of single copy probes for all euchromatic chromosome ends. 

15 Probes have been synthesized, hybridized and visualized to the chromosome specific terminal 
bands for all chromosomes. As stated previously, multiple probes for several chromosomal ends 
have ben designed and validated. In Table 1 , one probe for each of several chromosome terminal 
bands (1 lq, 16p, 18p, 20p, and 22q) appear to detect paralogous or repetitive sequence families 
on other chromosomes. The remaining probes in this table and all additional probes in Table 3 

20 display the chromosomal specificity required for clinical application. 



Comparison of localized scFISH and Recombinant Subtelomeric Probe Locations 









scFISH probes' 




Recombinant probes 2 




Approx. 
Length(bp) 


SEQ ID NO. 


Distance from Telomere (kb) 3 


Estimated 
clone size 


Approximate distance ofSTSfrom 
telomere (kb) 4 


lptel 
Iptel 


2531* 


82 


1,045.411 - 1,047.942 


90 kb 


unknown 


3930* 


34 1,048.515 - 1,052.445 


----- -■--»= ■■ - - •• - 


* " ' : - ■ :: * ■ ' 


lptel 


3512* 


35 1,053.361 - 1,056.873 






lptel 


2671 


33 








fqteT 


1853 


JO 


7 07 Q 071 _ 7 041 777 


100 kb 


236 ±100 


lqtel 


1632* 


OD 


Q7 fi/17 QC O-iC 

of .(Hf - yo.ziD 




- - : ■ ■■■ - -• •• - • ■ ■ • - -■■ 


lqtel 


2503 


80 


89.194 - 86.692 






zptei 




46 


1 12.585 - 1 15.237 


1 7^ L-k 


177 4- 1 7S 
JZZ 3; I / _) 


2qtel 


3355 


79 


2,398.933 - 2,402.287 


60 kb 


390 ± 46 


3ptel 


2093* 


47 


18 1.265-1 83. .325 




">AQ -U OA 

z4o ± oU 


3ptel 


1834* 


49 


199. 161 -200.. 994 






iqtel 


2953 


48 


762.774 - 765.726 


95 kb 


997 ± 95 


3qtef 


2022* 


247 


595.753 - 593.731 






4ptel 


1796 


51 


246.384 - 248.179; 417.863 - 


145 kb 6 


(220-292) ± 73 








AX Q 7 1 O 7 






. : ■ -m -m-nr.-. 

4qtel 


..... r - 

2426 


^a 


/l/l"? Q/^T 4/1 <\ 191 
*f*+Z.yO/ - 44jJo / 


130kb 


930 ± 130 


5ptel 


2189 


jO 




191kb 


unknown 


5qtel 


2795 


54 


2,032.602 - 2,035.396 


105 kb 


227 ± 105 


5qtel 


2661 


cc 

JJ 


9 ni0 4S4 7 077 114 
Zy\J ly.HDH - Z,UZZ. 1 l*t 






oqtei 


ZOO J 


OZ 


R97 9Qfi — R94 R*V7 




5qtel 


1 753* 


CO 

OO 


4ZZ.O 1 O — *fZU. /OO 




6ptel 


2152 


248 


iyy.45/ - ZU I. OJo 


80 kb 


unknown 


6qtel 


2554 


57 


175.551 - 178.104 


100 kb 


(276-282) ± 94 


7ptel 


2872 


61 


815.565- 818.439 


60 kb 


218±59 


7ptel 


z*t J** 


78 


143.257-145.691 






/ptei 


Z.54©* 


59 


" 7 146.749 r 3_497097_ 






7qtel 


2574 


60 


1,095 575 - f,098 148 ~ 


95 kb 


225 ± 95 


7qtel 


1517* 


75 


28.945 - 27.428 




7qtel 




76 


"~5.405 - 3.771 




7qtel 


1865* 


74 ™ 


81.313-79.448 




8ptel 


2079 


64 


483.728-485.805 


135kb 


1,200 ± 135 


8ptel 


2271 


249 


455.377 -457.645 






8qtel 


21 54 


63 * 


71.870 - 74.023 


100 kb 6 


194 ± 100 


8qtel 


2949 


250 


145.868- 148.816 






9ptel 


1754 


251 


243.057-244.809 


115kb 


140± 115 v 


9qtel 


2232 


66 


248.993-251.226 


95 kb 


223 ± 95 


9qte1 


2707 


65 


231.636-234.340 






9qtel 


2278 


67 


257.634-259.785 






lOptel 


2132^ 


5 


363.852-365.942 


80 kb 6 


328 ±80 


1 Optel 


205 r 


2 


320.896-322.898 






1 Optel 
1 Optel 


3203 + 
2526^ 


4 
3 


282.669-285.872 
151.566-154.092 






" lOqtel 


1820 




184.961 -186.780 






1 lptel 


2884 


8 


1,205.118- 1,208.002 


HOkb 6 


290 ± 110 


1 lptel 


2489* 


9 


66.589-69.078 




"lqtel 


2462 


7 


1,781.588- 1,784.049 j 


160kb 


unknown 


1 lqtel 


2026* 


6 


33.471 -31.445 
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12ptel 


1914 


u 


180.472 - 182.385 


100 kb 


0-209 


12qtel 


3456 


10 


154.406- 157.861 


165 kb 


180 ±165 


13qte1 


3209 


12 


366.172 -369.380 


75 kb 


2,900 ± 75 


14qtel 


1866 


16 


3,155.170 -3,157.035 


160 kb 


(4,100-4,200) ±117 


14qtel 


3839 


15 


3,128.031 -3,131.869 






14qtel 


1984* 


13 


1,022.102- 1,020.118 




14qtel 


2617* 


14 


1,019.175-1,016.558 




15qtel 


1607 


17 


131.552 - 133.158 


lOOkb 


420 ± 100 


1 6ptel 


3361* 


20 


73.825-77.186 


110 kb 


3056± 110 


I6ptel 


2082* 


19 


56^610 - 58.692 






16qtel 


2567 


18^ 


183.506-186.072 


HOkb 6 


210 ± 110 " 


17ptel 


2593 


23 


895.021 -897.613 


70 kb 6 


105 ±70 


17ptel 


4984 


22 


859.347 - 864330 




. _ ^ .. t . _ . i 


17ptel 
17qtel 


2219* 


21 


101.957- 104 "176 






6191* * 


. 4f " ~ '\ .. 


7 '1M452 - ipo|^ " ^ 


"7 160~kb ~ 


750 ± 160 


nqtel " 


" 3026 ° 


245 _ / " ^ ' 


848.341 -871383 






lfptej 


2368 


" 246 J Z~ 


336!408 - 338.775 """ 


" 160kb 


7T 209 ±160 


18qtel 


2530 


26 


' 80.057 - 82.584 " 


~170 kb 


(154-285) ±40 


19ptel 


1815 


30 


1,745.686- 1,747.500 


80 kb 


unknown 


19ptel 


2094 


27 


1,721.659- 1,723.752 


:r -" - -* —••••-= ■ -- 




19ptel 


240C? 


29 


265.605 - 268.005 






L 9 il e l 


4137* 


28 


249 688 - 253 825 


**" r • '• •" "~ 




19qtel 


2721 


31 


1 2 1 .866 - 1 24.586 




244 ± 160 


1 9qtel 


2399 


32 


88.475 - 90.874 






20ptel 


2616 


39 


365.951 - 368.566 


160 kb 

_ ------ 


0-240 


20qtel 


3133 


43 


109.581 - 1 12.713 




62-202 


20qtel 


3695 


42 


114.557 - 118.251 






20qtel 


2166 


41 


140.088- 142.253 






20qtel 


2997 


40 


186.460- 189.456 






21qtel 


4370 


44 


47.861 -52.230 


170 kb 


0-337 


22qtel 


3550 


45 


176.274- 178.618 


80 kb 


(161-168) ±73 


Xptel 


1896 


69 


2,329.080 - 2,330.975 


175kb 


-A 2 ±J$J1 5 (X,Y homology) 8 


Xptel * 


3700* 


~>0 


155.557-159.257 _ 






" Xqtel '" 


2529 


""71 


645.399 - 647.927 ™ ^ 






Yptel 


2446 


72 


2,562.365 -2,564.810 


175kb 


Unknown (X,Y homology) 8 


Yptel 


2000 


73 


2,567.816-2,569.815 


170kb 





^cFISH probes developed from April 2003 genome draft are labeled with asterisk (*). The remaining probes were from April 01 draft 
except lp (Nov 02), 2q, 3q, 4p, 5p, 6, 8q, 9p (June 02). Sequence IDs corresponding to these probes contain the UCSC database 
version number in the descriptions of these products. 

2 Many of conventional FISH probes were developed by Knight et al. Am. J. Hum. Genet. 67: 320, 2000, and by Abbott 
Laboratories /Vysis, Inc. 

3 Distance from probe to end of the telomere reported in this table is based on the length of the interval from the probe boundary 
coordinates to the terminal nucleotide coordinates of each chromosome end in the April 03 version of genome sequence. The 
computer program BLAT at the Genome Browser website (genome.ucsc.edu) was used to determine these coordinates. Due to 
inaccuracy in the BLAT algorithm, the coordinates of probe boundaries may differ from the actual coordinates slightly. 

4 The position of STS/ marker associated with the conventional FISH probe was determined in the April 03 version of the genome 
sequence. Often a single STS/ marker is identified on a clone. There is insufficient information available to determine the positions of 
STS markers on some of these clones. As a result, error in positioning a probe on the chromosome (ie. ± ) is generally the size of the 
clone provided in: American Journal of Human Genetics 67: p. 320, 2000, and by Abbott/Vysis, Inc . A standard deviation less than 
the estimated clone size indicates that more than one STS was localized to the clone. 

indicates clones with cross hybridizations to other chromosomes. 

7 Probe recognizes a neighboring paralogous sequence in addition to the known interval. 

8 Reported STS located on X chromosome only, but both commercial probes for sex chromosomes show homology with each other. 

9 Probe detect four paralogs: three of which are on chromosome 9 and one which is on chromosome 2. 

unknown = Reported STS/ markers could not be placed on genome sequence as they could not be located in all available genome 
databases or through communication with authors. 

A hybridization was detected when probe was combined with other lOptel probes labeled with " A ". 
+ hybridization was detected when probe was combined with other lOptel probes labeled with "+'\ 
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Table 3 compares the location of the corresponding single copy probe with the distance 
between the end of the available chromosomal sequence and the subtelomeric STS contained 
within the cloned subtelomeric probe. Commercially available cloned subtelomeric probes (e.g. 
from Vysis, Inc.) have been positioned on the genome sequence (April 2003 version) based upon 
5 one or more sequence tagged sites (STS) contained within them. These STS markers, however, 
represent a very short interval within the larger cloned segment; therefore, it is not possible to 
delineate the proximal or distal boundary of the clone from the STS, but the approximate 
genomic location of the clone can be inferred from the location of the STS. Given the known 
lengths of a clone and the STS coordinate, it is possible to bracket a range of genomic 

10 coordinates covered by that clone. As noted in Table 3, the majority of the single copy probes 
developed with the present invention are considerably closer to the end of the chromosome than 
the cognate recombinant probe. The largest differences in distances between the locations of the 
single copy probes of the present invention and available cloned subtelomeric probes are found 
for 8pter, 13qter, 14qter, and 16pter where the single copy probes are ~ 800 kb or greater closer 

15 to the ends of these chromosomes. The distal 8pter interval separating the single copy probes and 
conventional probe contains 4 or more genes that, if deleted, would not be detected with the 
cloned probe but would be detected with the single copy probe. The distal 13qter region (see 
Fig. 1 7) contains over 1 0 confirmed or predicted genes and the distal 1 4qter contains 3 confirmed 
genes and 30-40 predicted genes while the 16pter region has more than 200 confirmed and 

20 predicted genes. Well-characterized loci in 8p distal to the existing cloned subtelomeric FISH 
probe, for example, include genes encoding a member of the p53 binding protein family, an 
interferon induced protein 1 5 family member, beta-2-like guanine nucleotide-binding protein 
(which has a role in protein kinase C mediated signaling), and a sequence related to the C5A 
receptor (which is required for mucosal host cell defense in the lung). The 14qter region that is 

25 distal of the cloned subtelomeric probe contains the JAG2 gene, a ligand of the Notch receptor, 
which has essential roles in craniofacial morphogenesis, limb, thymic development and cochlear 
hair cell development. It is apparent that loss of a single allele in any of these genes (and others 
that have not been as thoroughly characterized) will have an adverse clinical outcome. The single 
copy probes developed for the present invention are the only currently available subtelomeric 

30 FISH probes capable of detecting hemizygosity at these loci. 

A representative composite panel of 12 subtelomeric single copy probes (or probe 
combinations) hybridized to normal metaphase chromosomes is shown in Figure 1. Each panel 
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indicates the telomere detected and the approximate size of the probe (sizes correspond to the 
"Approximate size" column from Table 1 . The arrows indicate the probe hybridizations to the 
chromosomal ends. Each of the probes specifically hybridize to the homologous chromosome 
pair from which the sequence is derived. 
5 Table 1 summarizes all of the probes that have been hybridized by September 2002 by 

chromosome, primer coordinates, chromosome end, approximate and precise sizes of the 
amplified single copy products. Multiple products from the same subtelomeric region have been 
individually hybridized except for chromosome lOp, which was hybridized in combination with 
other lOp probes. As shown in that Table, some probes (e.g. 18ptel) exhibited cross 

10 hybridization and some (e.g. 22q) required additional verification prior to ruling out cross 
hybridization. Furthermore, a 16p probe cross-hybridized despte C 0 tl suppression. 

Table 2 indicates the primers used to amplify each of the probes, the coordinates and the 
sequences of the primers [derived from the April, 2001 version of the human genome sequence 
(available online at the genome browser website at the University of California Santa Cruz), and 

1 5 the predicted and then experimentally optimized annealing temperatures for the primers in the 
amplification reactions that generated the PCR products and the lengths of the amplification 
products generated with these primers. In general, the optimal annealing temperature was found 
to lie within 5 degrees C of the predicted annealing temperature. After optimization of the PCR 
reaction conditions, all of the products indicated in Table 2 produced single homogenously 

20 stained bands by electrophoresis or single sharp peaks in absorbance at a specific timepoint on 
the DHPLC-Wave system (Transgenomic, Omaha). A subset of these products was labeled and 
localized to human metaphase chromosomes and are included in Table 3. Table 3 includes the 
probes from Table 1 that did not cross hybridize to other regions as well as additional probes that 
we have hybridized to chromosomes since September 2002. The more recently mapped probes 

25 have been developed from the April 2003 version of the genome sequence and in many instances 
are closer to the chromosomal ends. Table 3 gives the precise size of the single copy probe and 
compares the distance it is from the chromosomal end to that of the synthetic commercial probes. 

We observed a number of probes with genomic paralogs detected by molecular 
cytogenetic analysis, but not by sequence analysis of the April 2001 genome sequence or 

30 subsequent version, indicating that the genome sequence is incomplete in the regions containing 
these paralogous sequences. Complex paralogous domains have also been shown to produce 
incorrect assemblies of these regions, and this could result in the merging of the paralogous-non- 
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allelic copies into fewer genomic loci. Therefore, probes designed according to this method must 
be validated by hybridization to normal controls prior to their application to detection of 
unbalanced rearrangements in patients. This approach may turn out to be useful in identifying 
potential misassembled regions in future versions of the human genome sequence . Cross- 
5 hybridization to unsequenced or incorrectly sequenced genomic regions has precedent (see 
previous Continuation in Part application; US Serial #09/854,867, the teachings and content of 
which are hereby incorporated by reference). Previously, we developed probes from two regions, 
in which closely spaced, highly similar (>95%) paralogous sequences have been localized. The 
regions include the Down syndrome region on chromosome 21q and the chromosome 16p 

1 0 inversion region for type M4 acute myelogenous leukemia. Both probes hybridized to paralogs 
on their respective chromosomes but also hybridized to the short arms of acrocentric 
chromosomes. In these instances, cross-hybridization was suppressed by preannealing with 
highly repetitive DNA. 

Probes with hybridizations to paralogous sequences on other chromosomes or at distant 

15 loci (>1 Mb) on the same chromosome compromise the specificity of the assay for detecting 
abnormalities for the telomere that the probe is designed to detect. In such cases, the sequences 
in the probe with paralogy to other chromosomal loci have been eliminated. The preferred 
approaches for eliminating such sequences include (1) selecting and producing alternate probes 
from the neighboring chromosomal intervals or (2) redesigning probes to eliminate the 

20 subsequences that are paralogous to other chromosome loci. Since single copy intervals of 
suitable size for single copy FISH are densely arranged in the genome, we have generally 
preferred to develop new probes from adjacent genomic intervals. This approach is less time 
consuming and less labor intensive than bisecting a probe with paralogous counterparts, however 
probe bisection, is, in some instances, the only alternative, especially if a probe derived from a 

25 particular (small) gene is required. Marked entries in tables 1 and 2 indicate examples of 
alternate single copy hybridization probes for telomeres where paralogies to other chromosomes 
had been initially observed. 

Discussion: 

30 We have developed, tested, and validated a method of producing single copy probes that 

will detect chromosome rearrangements involving most of the human subtelomeric regions, 
developed chromosome arm-specific probes for the 42 euchromatic terminal regions and 
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demonstrated that 56 are clearly to the ends of these chromosomes or fall within the range of 
potential locations for the commonly-used cloned probes but could be closer if the precise 
locations of the cloned probes could be determined. These single copy probes can therefore 
detect smaller and more terminal chromosomal imbalances involving subtelomeric sequences 
5 than existing probes. We infer that these probes will have greater sensitivity in detecting 
idiopathic mental retardation and other clinical abnormalities that result from this type of 
aneuploidy. The location of the probes on the chromosomes is clearly shown in Figs. 2-13 with 
Fig. 1 being a compilation of Figs 2-13 and was prepared using the raw photos of these Figs. 
Fig. 14 shows the location of 19qtel which is not represented in Fig. 1. 

10 Thus, the present invention provides methods of determining and developing 

subtelomeric DNA probes which are smaller than were previously available and usually closer 
to the telomere. These smaller probes are able to detect smaller mutations, deletions, and 
rearrangements that larger probes are unable to detect due to their size. Moreover, some 
mutations, deletions, and rearrangements may actually occur within the sequence of the larger 

15 probes and such sequences could not have been detected using the probe but could be detected 
using the methods and probes of the present invention. The probes of the present invention are 
able to detect chromosomal rearrangements which are closer to the ends of the chromosomes than 
was previously possible. This is due to the fact that the probes of the present invention are 
developed by starting at the very end of each arm of each chromosome and working inward to 

20 find one or more unique sequences which are then used to develop corresponding probes. Cross- 
hybridizing sequences are preferably eliminated computationally, that is to say that sequences 
identified will be compared to known sequences such that there will be little to no cross 
hybridization rather than by experimentally determining whether or not you have a probe which 
cross-hybridizes. Specific examples of subtelomeric probes of the present invention have been 

25 developed using the primers identified herein as SEQ ID Nos. 83-244. 

Example 2 

30 This example describes the design, synthesis, validation and hybridization of an 18qtel 

(2530 bp) probe. 
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Materials and Methods: 

A probe from the subtelomeric interval on the long arm of chromosome 1 8 was developed 
on 7/30/2001 from the human genome sequence published on April 1, 2001. Sequences from 
this chromosome were downloaded and analyzed with custom software that was developed to 
5 automatically identify prospective single copy intervals and select primer sequences for the 
polymerase chain reaction. Of course, any method that will identify prospective single copy 
sequences can be used for purposes of the present invention. A Unix script, integrated_single 
copy FISH, manages the process. The user is requested to provide the version of the human 
genome sequence from which probes are designed, the coordinates of the chromosomal region 

10 and the minimum length of the single copy interval. The minimum length of this interval was 
chosen to be 1500 nucleotides, based on ease of visualization of FISH probes by fluorescence 
microscopy. The software will, however, identify single copy intervals of any desired size. An 
interval containing the terminal 349,999 bp was input and the script retrieved this sequence from 
the genome browser at the University of California-Santa Cruz website. A Perl program, 

15 findirepeatmask.pl then computed the coordinates of all >1500 bp intervals from the output of 
the RepeatMasker program (Smit A and Green P, University of Washington). The Delila 
program, xyplo at the ncifcrf website displayed a scatterplot indicating the locations of the single 
copy intervals. The script then called a series of sequence analysis programs (Wisconsin 
package; (from accelrys.com), first extracting sequences of each single copy subinterval from the 

20 larger sequence, and then selecting oligonucleotide primer sequences optimized for long PCR 
for each subinterval. The chromosome 18 subinterval from 83,779,017 to 83,879,017 was 
selected for primer design. Primer selection was performed with a Perl script (primwrapper.pl 
which executes the Wisconsin program prime) by dynamically decrementing primer annealing 
temperature, product G/C composition and interval length beginning with the most stringent 

25 conditions, as we have previously described (Rogan et al. Genome Research, 11:1086-1094, 
2001, the content and teachings of which are incorporated by reference). Design of a set of 
potential probes in the 350 kb genomic region required -1 hour on a 300 MHz Unix workstation. 
For this chromosome 18 interval, the software offered 25 potential intervals for this long PCR 
reaction. We selected product 22, which is between 80,057 and 82,584 bp from the end of the 

30 given sequence in the "finished" April 2003 genome reference sequence. In the April 2001 
sequence , this chromosome 1 8 sequence was not completed and the probe sequence fell between 
43227 and 45756 bp from the end of the available sequence. Even though the RepeatMasker 
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software screens the sequence for repetitive sequence families that are common in the human 
genome, this software does not detect complex paralogous or low copy number segmental 
duplicated regions in the genome that do not technically meet the criterion of a repetitive 
sequence. The single copy composition of this sequence was therefore verified computationally 
5 with the BLAT tool at the UCSC Genome Browser website. This tool rapidly determines 
whether other sequences in the genome are related to a query, and if so the length and the percent 
similarity of those sequences relative to the query. A script was developed to automate this 
BLAT procedure for multiple intervals simultaneously. Related sequences less than or equal to 
500 bp in length or <1000 bp sequences with more than 30% divergence were unlikely to cross- 
10 hybridize to the probe under the hybridization and wash stringency conditions used to detect 
chromosomal sequences. Sequences that exceeded these thresholds were generally rejected as 
potential probes, however no such related sequences were detected computationally for the 18q 
tel region. 

The PCR primers that amplify this product consisted of a 30 mer forward and 32 mer 

1 5 reverse strands (SEQ ID NOS 1 93 and 1 94). These DNA primers were synthesized by IDT Inc. 
(Coralville IA), and resuspended in 500 ul of double distilled H 2 0 then diluted to a working stock 
concentration of 10 uM. Initially, the primers were tested for their ability to produce an 
amplification product of the expected size, ie. 2530 bp - based on their respective coordinates 
in the genome. The test PCR reaction comprised a total of 25 ul and consisted of the forward 

20 and reverse primers (each at 0.9 uM), 30 ng of human genomic high molecular weight DNA 
(stored at 4 deg C; Promega, Madison WI), 1.5 mM MgS04, 0.625 units of Platinum Pfx 
polymerase, 1 OX Reaction buffer, 1 .25 mM dNTPs, and IX PCR Enhancer solution (components 
and conditions from the manufacturer Invitrogen, Carlsbad CA). The initial amplification was 
carried out at the melting temperature predicted by the primer design program, 60 deg C. 

25 Agarose gel electrophoresis revealed the product had the expected size, however additional 
reaction optimization was needed to obtain a homogeneous product. The Biomek 2000 
laboratory automation workstation was used to set up a simultaneously set of parallel reactions 
for this 1 8qtel and other products for other subtelomeric regions. For temperature optimization, 
these parallel reactions were each amplified by PCR at a different annealing temperatures, 

30 specifically 53.2, 55.5, 58.4, 61 .8, 64.6, and 66.8 deg C on a gradient thermalcycler (MJ Research 
Alpha) with the same reaction conditions as above, except that the primers were added at 0.3 uM 
in the optimizing reactions. The thermal cycling conditions were: initial denaturation of 
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genomic template for 2 minutes at 94 deg C, followed by 15 cycles at the above annealing and 
extension temperatures for 5 minutes and denaturation for 20 minutes. This was followed by an 
additional 1 5 cycles at the same temperatures, but the annealing and extension step was increased 
in duration by 5 minutes per cycle. After a primer extension polishing step at 68 deg C for 10 
5 minutes, the reaction was chilled and held at 0 deg C. The products were separated by agarose 
gel electrophoresis and inspected to determine the maximum yield that generated the purest 
products. The optimum temperature for product of this probe was found to be 64 deg C. The 
reaction was scaled up to a 200 ul final volume (ie. -2 ug) to prepare sufficient amounts of PCR 
product for labeling and several fluorescence in situ hybridization assays. The product was 

10 separated on a preparative agarose gel, the band was excised, and purified using a Montage 
extraction spin column (Millipore, Watertown MA). The eluate from the column was 
precipitated with ethanol, briefly dessicated, and resuspended in double distilled water at a 
concentration of 100 ng/ul. Approximately 1 ug of product was recovered. This solution was 
labeled by nick-translation with either digoxygenin-modified or biotinylated dUTP as described 

15 in Rogan et al (2001 ). This procedure provided sufficient amounts of probe for denaturation and 
hybridization to 5 slides containing metaphase and interphase chromosomes from normal 
individuals and patient specimens. 

Results: 

20 Experimental validation of the probe showed that it did not hybridize to any other 

chromosomal region in cells from a normal individual with a normal karyotype, consistent with 
computational prediction that this sequence was present in a single copy in the genome. This 
probe, having passed both computational and experimental validation, was selected based on its 
close proximity to the terminus of chromosome 18q for analysis of a patient thought to carry a 

25 terminal rearrangement of this chromosome. Figure 1 8 shows an example of this probe detecting 
a translocation of this sequence to the terminal band on the p arm of chromosome 6 in a patient 
with a 6; 18 translocation. In this figure, an 18q subtelomeric probe (2530 bp in length) is 
hybridized to an abnormal metaphase cell. This cell has a translocation between the short arm 
of one chromosome 6 and the terminal chromosomal band on one chromosome 1 8 . The locations 

30 of the translocation sites are indicated by arrows on the normal G-banded chromosome 6 and 
normal G-banded chromosome 1 8. The translocated or derivative (der) G-banded chromosomes 
6 and 18 are also included. The position of the 18q probe is indicated in red. The chromosome 
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18q probe (detected in red) is hybridized to the normal chromosome 18 and the derivative 
chromosome 6 as shown in the left panel. The derivative chromosome 1 8 does not hybridize as 
its subtelomeric region as been exchanged with chromosome 6p genetic material 



